scala-submit java.lang.ClassNotFoundException - scala

spark 2.7 scala 2.12.7 ,when i use spark-submit submit a simple project --WordCount, i ensure package and className is OK, but still have a error
java.lang.ClassNotFoundException
as my code:
1../bin/spark-submit --master spark://localhost.localdomain:7077 --class sparkTes.WordCount.scala /java/spark/scala.jar
2.enter image description here
3.spark code
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("wordcount");
val sc = new SparkContext(conf)
val input = sc.textFile("/java/text/scala.md", 2).cache()
val lines = input.flatMap(line=>line.split(" "))
val count = lines.map(word => (word,1)).reduceByKey{case (x,y)=>x+y}
val output = count.saveAsTextFile("/java/text/WordCount")
}

Related

Error file not exists even though file is present when passing filename as args in spark-submit

I am passing config filename as args in spark2-submit. But i am getting file not exists error even though the file exists. If i hard-code the file name, it is working fine.
spark2-submit --files /data/app/Data_validation/target/input.conf --class "QualityCheck" DC_framework-jar-with-dependencies.jar "input.conf"
code:
import org.apache.spark.SparkFiles
object QualityCheck {
def main(args: Array[String]) : Unit={
val configFile = SparkFiles.get("input.conf")
val conf = new SparkConf().setAppName("Check Global")
val sc = SparkContext.getOrCreate(conf)
val spark = SparkSession.builder.getOrCreate()
println(configFile)
if (file.exists) {
val config = ConfigFactory.parseFile(file)
} else {
println("Configuration file does not exist")
Stacktrace:
/data/app/Data_validation/target/input.conf
Configuration file does not exist
Config(SimpleConfigObject({}))
please help!
Please check below code.
spark2-submit --class "QualityCheck" DC_framework-jar-with-dependencies.jar "/data/app/Data_validation/target/input.conf"
object QualityCheck {
def main(args: Array[String]) : Unit={
val file = new File(args(0))
if (file.exists) {
val config = ConfigFactory.parseFile(file)
} else {
println("Configuration file does not exist")
}
val conf = new SparkConf().setAppName("Check Global")
val sc = SparkContext.getOrCreate(conf)
val spark = SparkSession.builder.getOrCreate()
}
}

H2O fails on H2OContext.getOrCreate

I'm trying to write a sample program in Scala/Spark/H2O. The program compiles, but throws an exception in H2OContext.getOrCreate:
object App1 extends App{
val conf = new SparkConf()
conf.setAppName("AppTest")
conf.setMaster("local[1]")
conf.set("spark.executor.memory","1g");
val sc = new SparkContext(conf)
val spark = SparkSession.builder
.master("local")
.appName("ApplicationController")
.getOrCreate()
import spark.implicits._
val h2oContext = H2OContext.getOrCreate(sess) // <--- error here
import h2oContext.implicits._
val rawData = sc.textFile("c:\\spark\\data.csv")
val data = rawData.map(line => line.split(',').map(_.toDouble))
val response: RDD[Int] = data.map(row => row(0).toInt)
val str = "count: " + response.count()
val h2oResponse: H2OFrame = response.toDF
sc.stop
spark.stop
}
This is the exception log:
Exception in thread "main"
java.lang.RuntimeException: When using the Sparkling Water as Spark
package via --packages option, the 'no.priv.garshol.duke:duke:1.2'
dependency has to be specified explicitly due to a bug in Spark
dependency resolution. at
org.apache.spark.h2o.H2OContext.init(H2OContext.scala:117)

Spark-submit cannot access local file system

Really simple Scala code files at the first count() method call.
def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("Spark File Count"))
val fileList = recursiveListFiles(new File("C:/data")).filter(_.isFile).map(file => file.getName())
val filesRDD = sc.parallelize(fileList)
val linesRDD = sc.textFile("file:///temp/dataset.txt")
val lines = linesRDD.count()
val files = filesRDD.count()
}
I don't want to set up a HDFS installation for this right now. How do I configure Spark to use the local file system? This works with spark-shell.
To read the file from local filesystem(From Windows directory) you need to use below pattern.
val fileRDD = sc.textFile("C:\\Users\\Sandeep\\Documents\\test\\test.txt");
Please see below sample working program to read data from local file system.
package com.scala.example
import org.apache.spark._
object Test extends Serializable {
val conf = new SparkConf().setAppName("read local file")
conf.set("spark.executor.memory", "100M")
conf.setMaster("local");
val sc = new SparkContext(conf)
val input = "C:\\Users\\Sandeep\\Documents\\test\\test.txt"
def main(args: Array[String]): Unit = {
val fileRDD = sc.textFile(input);
val counts = fileRDD.flatMap(line => line.split(","))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.collect().foreach(println)
//Stop the Spark context
sc.stop
}
}
val sc = new SparkContext(new SparkConf().setAppName("Spark File
Count")).setMaster("local[8]")
might help

Class not found in simple spark application

I'm new to Spark and wrote a very simple Spark application in Scala as below:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object test2object {
def main(args: Array[String]) {
val logFile = "src/data/sample.txt"
val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating",
List("target/scala-2.10/simple-project_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numTHEs = logData.filter(line => line.contains("the")).count()
println("Lines with the: %s".format(numTHEs))
}
}
I'm coding in Scala IDE and included the spark-assembly.jar into my code. I generate a jar file from my project and submit that to my local spark cluster using this command spark-submit --class test2object --master local[2] ./file.jar but I get this error message:
Exception in thread "main" java.lang.NoSuchMethodException: test2object.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1665)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:649)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
What is wrong here?
p.s. my source code is under the project root directory (project/test2object.scala)
I didn't use spark 0.9.1 before, but I believed the problem came from this line of code:
val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating", List("target/scala-2.10/simple-project_2.10-1.0.jar"))
If you change to this:
val conf = new SparkConf().setAppName("Simple App")
val sc = new SparkContext(conf)
This will work.

What are All the ways we can run a scala code in Apache-Spark?

I know there is two ways to run a scala code in Apache-Spark:
1- Using spark-shell
2- Making a jar file from our project and Use spark-submit to run it
Is there any other way to run a scala code in Apache-Spark? for example, can I run a scala object (ex: object.scala) in Apache-Spark directly?
Thanks
1. Using spark-shell
2. Making a jar file from our project and Use spark-submit to run it
3. Running Spark Job programmatically
String sourcePath = "hdfs://hdfs-server:54310/input/*";
SparkConf conf = new SparkConf().setAppName("TestLineCount");
conf.setJars(new String[] { App.class.getProtectionDomain()
.getCodeSource().getLocation().getPath() });
conf.setMaster("spark://spark-server:7077");
conf.set("spark.driver.allowMultipleContexts", "true");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> log = sc.textFile(sourcePath);
JavaRDD<String> lines = log.filter(x -> {
return true;
});
System.out.println(lines.count());
Scala version:
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.{SparkConf, SparkContext}
object SimpleApp {
def main(args: Array[String]) {
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("okka").setLevel(Level.OFF)
val logFile = "/tmp/logs.txt"
val conf = new SparkConf()
.setAppName("Simple Application")
.setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache
println("line count: " + logData.count())
}
}
for more detail refer to this blog post.