Failed to load class SparkDemo on spark-submit - scala

I am trying to create a simple Hello World program with Scala to run on on Apache Spark 3.1.1. However I am getting the following error on spark-submit
I am using the below command:
spark-submit --class SparkDemo --master local --executor-memory 800m /Users/souravhazra/Downloads/SparkDemo.jar
This is my Scala code:
object SparkDemo {
def main(args: Array[String]): Unit = {
println("Hello World")
}
}
This is my build.sbt
I am stuck in this issue over a couple of days. Please help. Thanks in advance.

Related

SparkException: Cannot load main class from JAR file:/root/master

I want to use spark-submit to submit my spark application. The version of spark is 2.4.3. I can run the application by java -jar scala.jar.But there has some error when I run spark-submit master local --class HelloWorld scala.jar.
I am trying to change the submit-method including local, spark://ip:port but has not result. there is always throwing the error below when I modify path of jar anyway.
There is the code of my application.
import org.apache.spark.{SparkConf, SparkContext}
object HelloWorld {
def main(args: Array[String]): Unit = {
println("begin~!")
def conf = new SparkConf().setAppName("first").setMaster("local")
def sc = new SparkContext(conf)
def rdd = sc.parallelize(Array(1,2,3))
println(rdd.count())
println("Hello World")
sc.stop()
}
}
When I use spark-submit the error below will happen.
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/root/master
at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:221)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:911)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:911)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am very sorry, the reason of the error happens is I forget to add '--' before master. So I try to run application by spark-submit --master local --class HelloWorld scala.jar. Finally, it is work fine.

Error in running Scala in terminal: "object apache is not a member of package org"

I'm using sublime to write my first Scala program, and I'm using terminal to run it.
First I use scalac assignment2.scala command to compile it, but it show error message:"error: object apache is not a member of package org"
How can I do to fix it?
This is my code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object assignment2 {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("assignment2")
val sc = new SparkContext(conf)
val input = sc.parallelize(List(1, 2, 3, 4))
val result = input.map(x => x * x)
println(result.collect().mkString(","))
}
}
Where are you trying to submit the job. To run any spark application you need to submit it from bin/spark-submit in your spark installation directory or you need to have spark-home set in your environment, which you can refer while submitting.
Actually you can't run spark-scala file directly because for compilation your scala class, you need spark library. So for executing scala file you required spark-shell. For executing your spark scala file inside spark-shell, please find the below steps:
Open your spark-shell using next command-
'spark-shell --master yarn-client'
load your file with exact location-
':load File_Name_With_Absoulte_path'
Run you main method using class name- 'ClassName.main(null)'

How can I add configuration files to a Spark job running in YARN-CLUSTER mode?

I am using spark 1.6.0. I want to upload a files using --files tag and read the file content after initializing the spark context.
My spark-submit command syntax looks like below:
spark-submit \
--deploy-mode yarn-cluster \
--files /home/user/test.csv \
/home/user/spark-test-0.1-SNAPSHOT.jar
I read the Spark documentation and it suggested me to use SparkFiles.get("test.csv") but this is not working in yarn-cluster mode.
If I change the deploy mode to local, the code works fine but I get a file not found exception in yarn-cluster mode.
I can see in logs that my files is uploaded to hdfs://host:port/user/guest/.sparkStaging/application_1452310382039_0019/test.csv directory and the SparkFiles.get is trying to look for file in /tmp/test.csv which is not correct. If someone has successfully used this, please help me solve this.
Spark submit command
spark-submit \
--deploy-mode yarn-client \
--files /home/user/test.csv \
/home/user/spark-test-0.1-SNAPSHOT.jar /home/user/test.csv
Read file in main program
def main(args: Array[String]) {
val fis = new FileInputStream(args(0));
// read content of file
}

ClassNotFoundException: com.databricks.spark.csv.DefaultSource

I am trying to export data from Hive using spark scala. But I am getting following error.
Caused by: java.lang.ClassNotFoundException:com.databricks.spark.csv.DefaultSource
My scala script is like below.
import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val df = sqlContext.sql("SELECT * FROM sparksdata")
df.write.format("com.databricks.spark.csv").save("/root/Desktop/home.csv")
I have also try this command but still is not resolved please help me.
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0
If you wish to run that script the way you are running it, you'll need to use the --jars for local jars or --packages for remote repo when you run the command.
So running the script should be like this :
spark-shell -i /path/to/script/scala --packages com.databricks:spark-csv_2.10:1.5.0
If you'd also want to stop the spark-shell after the job is done, you'll need to add :
System.exit(0)
by the end of your script.
PS: You won't be needing to fetch this dependency with spark 2.+.

SBT : Running Spark job on remote cluster from sbt

I have a spark-job (lets call it wordcount) written in Scala, which I am able to run in following manners
Run on a local spark instance from within sbt
sbt> runMain WordCount [InputFile] [Otuputdir] local[*]
Run on a remote spark cluster spark-submit the jar
sbt> package
$> spark-submit --master spark://192.168.1.1:7077 --class WordCount target/scala-2.10/wordcount_2.10-1.5.0-SNAPSHOT.jar [InputFile] [Otuputdir]
Code :
// get arguments
val inputFile = args(0)
val outputDir = args(1)
// if 3rd argument defined then use it
val conf = if ( args.length == 3 ) new SparkConf().setAppName("WordCount").setMaster(args(2)) else new SparkConf().setAppName("WordCount")
val sc = new SparkContext(conf)
How can I run this job on remote spark cluster from SBT ?
There is a sbt plugin for spark-submit. https://github.com/saurfang/sbt-spark-submit