Spark Runtime Error - ClassDefNotFound: SparkConf - scala

After installing and building Apache Spark (albeit with quite a few warnings), the compilation of our Spark application (using "sbt package") completes successfully. However, when trying to run our application using the spark-submit script, a runtime error results that states that the SparkConf class definition was not found. The SparkConf.scala file is present on our system, but it seems as if it is not being built correctly. Any ideas on how to solve this?
user#compname:~/Documents/TestApp$ /opt/Spark/spark-1.4.0/bin/spark-submit --master local[4] --jars /opt/Spark/spark-1.4.0/jars/elasticsearch-hadoop-2.1.0.Beta2.jar target/scala-2.11/sparkesingest_2.11-1.0.0.jar ~/Desktop/CSV/data.csv es-index localhost
Warning: Local jar /opt/Spark/spark-1.4.0/jars/elasticsearch-hadoop-2.1.0.Beta2.jar does not exist, skipping.
log4j:WARN No appenders could be found for logger (App).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/01 13:56:58 INFO SparkContext: Running Spark version 1.4.0
15/07/01 13:56:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/01 13:56:59 WARN Utils: Your hostname, compname resolves to a loopback address: 127.0.1.1; using [IP ADDRESS] instead (on interface eth0)
15/07/01 13:56:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/07/01 13:56:59 INFO SecurityManager: Changing view acls to: user
15/07/01 13:56:59 INFO SecurityManager: Changing modify acls to: user
15/07/01 13:56:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)
15/07/01 13:56:59 INFO Slf4jLogger: Slf4jLogger started
15/07/01 13:56:59 INFO Remoting: Starting remoting
15/07/01 13:56:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#[IP ADDRESS]]
15/07/01 13:56:59 INFO Utils: Successfully started service 'sparkDriver' on port 34276.
15/07/01 13:56:59 INFO SparkEnv: Registering MapOutputTracker
15/07/01 13:56:59 INFO SparkEnv: Registering BlockManagerMaster
15/07/01 13:56:59 INFO DiskBlockManager: Created local directory at /tmp/spark-c206e297-c2ef-4bbf-9bd2-de642804bdcd/blockmgr-8d273f32-589e-4f55-98a2-cf0322a05d45
15/07/01 13:56:59 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/07/01 13:56:59 INFO HttpFileServer: HTTP File server directory is /tmp/spark-c206e297-c2ef-4bbf-9bd2-de642804bdcd/httpd-f4c3c67a-d058-4aba-bd65-5352feb5f12e
15/07/01 13:56:59 INFO HttpServer: Starting HTTP Server
15/07/01 13:56:59 INFO Utils: Successfully started service 'HTTP file server' on port 33599.
15/07/01 13:56:59 INFO SparkEnv: Registering OutputCommitCoordinator
15/07/01 13:56:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/07/01 13:56:59 INFO SparkUI: Started SparkUI at http://[IP ADDRESS]:4040
15/07/01 13:57:00 ERROR SparkContext: Jar not found at file:/opt/Spark/spark-1.4.0/jars/elasticsearch-hadoop-2.1.0.Beta2.jar
15/07/01 13:57:00 INFO SparkContext: Added JAR file:/home/user/Documents/TestApp/target/scala-2.11/sparkesingest_2.11-1.0.0.jar at http://[IP ADDRESS]:33599/jars/sparkesingest_2.11-1.0.0.jar with timestamp 1435784220028
15/07/01 13:57:00 INFO Executor: Starting executor ID driver on host localhost
15/07/01 13:57:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44746.
15/07/01 13:57:00 INFO NettyBlockTransferService: Server created on 44746
15/07/01 13:57:00 INFO BlockManagerMaster: Trying to register BlockManager
15/07/01 13:57:00 INFO BlockManagerMasterEndpoint: Registering block manager localhost:44746 with 265.4 MB RAM, BlockManagerId(driver, localhost, 44746)
15/07/01 13:57:00 INFO BlockManagerMaster: Registered BlockManager
15/07/01 13:57:00 INFO MemoryStore: ensureFreeSpace(143840) called with curMem=0, maxMem=278302556
15/07/01 13:57:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 140.5 KB, free 265.3 MB)
15/07/01 13:57:00 INFO MemoryStore: ensureFreeSpace(12635) called with curMem=143840, maxMem=278302556
15/07/01 13:57:00 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.3 MB)
15/07/01 13:57:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44746 (size: 12.3 KB, free: 265.4 MB)
15/07/01 13:57:00 INFO SparkContext: Created broadcast 0 from textFile at Ingest.scala:159
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at org.elasticsearch.spark.rdd.CompatUtils.<clinit>(CompatUtils.java:20)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.elasticsearch.hadoop.util.ObjectUtils.loadClass(ObjectUtils.java:71)
at org.elasticsearch.spark.package$.<init>(package.scala:14)
at org.elasticsearch.spark.package$.<clinit>(package.scala)
at build.Ingest$.main(Ingest.scala:176)
at build.Ingest.main(Ingest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 17 more
15/07/01 13:57:00 INFO SparkContext: Invoking stop() from shutdown hook
15/07/01 13:57:00 INFO SparkUI: Stopped Spark web UI at http://[IP ADDRESS]:4040
15/07/01 13:57:00 INFO DAGScheduler: Stopping DAGScheduler
15/07/01 13:57:00 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/07/01 13:57:00 INFO Utils: path = /tmp/spark-c206e297-c2ef-4bbf-9bd2-de642804bdcd/blockmgr-8d273f32-589e-4f55-98a2-cf0322a05d45, already present as root for deletion.
15/07/01 13:57:00 INFO MemoryStore: MemoryStore cleared
15/07/01 13:57:00 INFO BlockManager: BlockManager stopped
15/07/01 13:57:01 INFO BlockManagerMaster: BlockManagerMaster stopped
15/07/01 13:57:01 INFO SparkContext: Successfully stopped SparkContext
15/07/01 13:57:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/07/01 13:57:01 INFO Utils: Shutdown hook called
15/07/01 13:57:01 INFO Utils: Deleting directory /tmp/spark-c206e297-c2ef-4bbf-9bd2-de642804bdcd
Here is the build.sbt file:
scalaVersion := "2.11.6"
name := "SparkEsIngest"
version := "1.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.4.0" % "provided",
"org.apache.spark" %% "spark-sql" % "1.4.0" % "provided",
"org.elasticsearch" % "elasticsearch-hadoop" % "2.1.0.Beta2" exclude("org.spark-project.akka", "akka-remote_2.10") exclude("org.spark-project.akka", "akka-slf4j_2.10") exclude("org.json4s", "json4s-ast_2.10") exclude("org.apache.spark", "spark-catalyst_2.10") exclude("com.twitter", "chill_2.10") exclude("org.apache.spark", "spark-sql_2.10") exclude("org.json4s", "json4s-jackson_2.10") exclude("org.json4s", "json4s-core_2.10") exclude("org.apache.spark", "spark-core_2.10")
)
if ( System.getenv("QUERY_ES_RESOURCE") != null) {
println("[info] Using lib/es-hadoop-build-snapshot/ unmanagedBase dir")
unmanagedBase <<= baseDirectory { base => base / "lib/es-hadoop-build-snapshot" }
} else {
println("[info] Using lib/ unmanagedBase dir")
unmanagedBase <<= baseDirectory { base => base / "lib" }
}
resolvers += "conjars.org" at "http://conjars.org/repo"
resolvers += "clojars" at "https://clojars.org/repo"

Is spark JAR inside the JAR you submitting ? it seems that you tell sbt you're providing the jar but i don't see "unmanagedJars in Compile += file(...)" in you're sbt, if you're counting that the jar are in the machine i would suggest you wouldn't do that since it mat lead to this kind of problems.
Try unrar you're JAR and see if spark JAR are there, if not use sbt-assembly or other tool of you're choice.

Don't use provided scope as it makes the dependencies available only while compiling and not while running the code. Changed code look like..
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.0",
"org.apache.spark" %% "spark-streaming" % "1.4.0",
"org.apache.spark" %% "spark-sql" % "1.4.0",
"org.elasticsearch" % "elasticsearch-hadoop" % "2.1.0.Beta2" exclude("org.spark-project.akka", "akka-remote_2.10") exclude("org.spark-project.akka", "akka-slf4j_2.10") exclude("org.json4s", "json4s-ast_2.10") exclude("org.apache.spark", "spark-catalyst_2.10") exclude("com.twitter", "chill_2.10") exclude("org.apache.spark", "spark-sql_2.10") exclude("org.json4s", "json4s-jackson_2.10") exclude("org.json4s", "json4s-core_2.10") exclude("org.apache.spark", "spark-core_2.10")
)

Related

Failing to execute spark-submit command on a sample word count project

I am doing a tutorial on Pluralsight for Apache Spark which is a simple word counter. I am on Windows 11 and I have IntelliJ IDEA 2022.3.1 (Ultimate Edition). Additionally, on my machine I have JKD8, Apache Spark 3.3.1 pre built for Hadoop 3.3 and later, and Hadoop 3.3.4. The code is written in Scala with SBT as the build tooland I've included the code below. After packaging the file with sbt package I run the command
spark-submit --class "main.WordCount" --master "local[*]" "C:\Users\user\Documents\Projects\WordCount\target\scala-2.11\word-count_2.11-0.1.jar"
I am receiving an exception
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat; (Full log below)
I have my dev tools (Java, Spark, Hadoop, etc) under C:\DevTools\TOOL and the Windows Environment variables are set as follows:
JAVA_HOME -> C:\DevTools\TOOL\Java
SPARK_HOME -> C:\DevTools\TOOL\Spark
HADOOP_HOME -> C:\DevTools\TOOL\Hadoop
PATH -> %JAVA_HOME%\bin; %SPARK_HOME%\bin; %HADOOP_HOME%\bin
Lastly, I've downloaded various winutils.exe and and hadoop.dll and I've put them in the Spark bin folder and the Hadoop bin folder but nothing seemingly works. Does anyone have any suggestions as to how I can get this to execute successfully?
build.sbt
name := "Word Count"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion
)
WordCount.scala
package main
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main (args: Array[String]): Unit = {
val configuration = new SparkConf().setAppName("Word Counter")
val sparkContext = new SparkContext(configuration)
val textFile = sparkContext.textFile("file:///DevTools/TOOL/Spark")
val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
val countPrep = tokenizedFileData.map(word=>(word, 1))
val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
val storedCounts = counts.sortBy(kvPair=>kvPair._2, false)
storedCounts.saveAsTextFile("file:///DevTools/TOOL/Spark/output")
}
}
Full Log
PS C:\Users\user\Documents\Projects\WordCount> spark-submit --class "main.WordCount" --master "local[*]" "C:\Users\user\Documents\Projects\WordCount\target\scala-2.11\word-count_2.11-0.1.jar"
23/01/26 17:00:08 INFO SparkContext: Running Spark version 3.3.1
23/01/26 17:00:08 INFO ResourceUtils: ==============================================================
23/01/26 17:00:08 INFO ResourceUtils: No custom resources configured for spark.driver.
23/01/26 17:00:08 INFO ResourceUtils: ==============================================================
23/01/26 17:00:08 INFO SparkContext: Submitted application: Word Counter
23/01/26 17:00:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/01/26 17:00:08 INFO ResourceProfile: Limiting resource is cpu
23/01/26 17:00:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/01/26 17:00:08 INFO SecurityManager: Changing view acls to: user
23/01/26 17:00:08 INFO SecurityManager: Changing modify acls to: user
23/01/26 17:00:08 INFO SecurityManager: Changing view acls groups to:
23/01/26 17:00:08 INFO SecurityManager: Changing modify acls groups to:
23/01/26 17:00:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set()
23/01/26 17:00:09 INFO Utils: Successfully started service 'sparkDriver' on port 50249.
23/01/26 17:00:09 INFO SparkEnv: Registering MapOutputTracker
23/01/26 17:00:09 INFO SparkEnv: Registering BlockManagerMaster
23/01/26 17:00:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/01/26 17:00:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/01/26 17:00:09 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/01/26 17:00:09 INFO DiskBlockManager: Created local directory at C:\Users\user\AppData\Local\Temp\blockmgr-c7d05098-5b05-4121-b1b6-2e7445fc9240
23/01/26 17:00:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
23/01/26 17:00:09 INFO SparkEnv: Registering OutputCommitCoordinator
23/01/26 17:00:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/01/26 17:00:10 INFO SparkContext: Added JAR file:/C:/Users/user/Documents/Projects/WordCount/target/scala-2.11/word-count_2.11-0.1.jar at spark://localhost:50249/jars/word-count_2.11-0.1.jar with timestamp 1674770408345
23/01/26 17:00:10 INFO Executor: Starting executor ID driver on host localhost
23/01/26 17:00:10 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/01/26 17:00:10 INFO Executor: Fetching spark://localhost:50249/jars/word-count_2.11-0.1.jar with timestamp 1674770408345
23/01/26 17:00:10 INFO TransportClientFactory: Successfully created connection to localhost/192.168.1.221:50249 after 58 ms (0 ms spent in bootstraps)
23/01/26 17:00:10 INFO Utils: Fetching spark://localhost:50249/jars/word-count_2.11-0.1.jar to C:\Users\user\AppData\Local\Temp\spark-d7979eef-eac8-4a89-8ee0-246a821703d6\userFiles-8222f8d5-3999-47a7-b048-a9c37e66150a\fetchFileTemp8156211875497724521.tmp
23/01/26 17:00:11 INFO Executor: Adding file:/C:/Users/user/AppData/Local/Temp/spark-d7979eef-eac8-4a89-8ee0-246a821703d6/userFiles-8222f8d5-3999-47a7-b048-a9c37e66150a/word-count_2.11-0.1.jar to class loader
23/01/26 17:00:11 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50306.
23/01/26 17:00:11 INFO NettyBlockTransferService: Server created on localhost:50306
23/01/26 17:00:11 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/01/26 17:00:11 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50306 with 366.3 MiB RAM, BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 358.0 KiB, free 366.0 MiB)
23/01/26 17:00:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.3 KiB, free 365.9 MiB)
23/01/26 17:00:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:50306 (size: 32.3 KiB, free: 366.3 MiB)
23/01/26 17:00:12 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:13
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:608)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:934)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:848)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:816)
at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2199)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2179)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions.$anonfun$reduceByKey$4(PairRDDFunctions.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:323)
at main.WordCount$.main(WordCount.scala:16)
at main.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/01/26 17:00:12 INFO SparkContext: Invoking stop() from shutdown hook
23/01/26 17:00:12 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/01/26 17:00:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/01/26 17:00:12 INFO MemoryStore: MemoryStore cleared
23/01/26 17:00:12 INFO BlockManager: BlockManager stopped
23/01/26 17:00:12 INFO BlockManagerMaster: BlockManagerMaster stopped
23/01/26 17:00:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/01/26 17:00:12 INFO SparkContext: Successfully stopped SparkContext
23/01/26 17:00:12 INFO ShutdownHookManager: Shutdown hook called
23/01/26 17:00:12 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-d7979eef-eac8-4a89-8ee0-246a821703d6
23/01/26 17:00:12 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-26625e11-a7f1-41f7-b2b3-29f97ea9e75a

Exception in thread "main" java.lang.NullPointerException com.databricks.dbutils_v1.DBUtilsHolder$$anon$1.invoke

I would like to read a parquet file in Azure Blob, so I have mount the data from Azure Blob to local with dbultils.fs.mount
But I got the errors Exception in thread "main" java.lang.NullPointerException
Below is my log:
hello big data
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/10 23:20:10 INFO SparkContext: Running Spark version 2.1.0
20/06/10 23:20:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/10 23:20:11 INFO SecurityManager: Changing view acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing view acls groups to:
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls groups to:
20/06/10 23:20:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Admin); groups with view permissions: Set(); users with modify permissions: Set(Admin); groups with modify permissions: Set()
20/06/10 23:20:12 INFO Utils: Successfully started service 'sparkDriver' on port 4725.
20/06/10 23:20:12 INFO SparkEnv: Registering MapOutputTracker
20/06/10 23:20:13 INFO SparkEnv: Registering BlockManagerMaster
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/10 23:20:13 INFO DiskBlockManager: Created local directory at C:\Users\Admin\AppData\Local\Temp\blockmgr-c023c3b8-fd70-461a-ac69-24ce9c770efe
20/06/10 23:20:13 INFO MemoryStore: MemoryStore started with capacity 894.3 MB
20/06/10 23:20:13 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/10 23:20:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/10 23:20:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
20/06/10 23:20:13 INFO Executor: Starting executor ID driver on host localhost
20/06/10 23:20:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 4738.
20/06/10 23:20:13 INFO NettyBlockTransferService: Server created on 192.168.0.102:4738
20/06/10 23:20:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/10 23:20:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.102:4738 with 894.3 MB RAM, BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:14 INFO SharedState: Warehouse path is 'file:/E:/sparkdemo/sparkdemo/spark-warehouse/'.
Exception in thread "main" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.databricks.dbutils_v1.DBUtilsHolder$$anon$1.invoke(DBUtilsHolder.scala:17)
at com.sun.proxy.$Proxy7.fs(Unknown Source)
at Transform$.main(Transform.scala:19)
at Transform.main(Transform.scala)
20/06/10 23:20:14 INFO SparkContext: Invoking stop() from shutdown hook
20/06/10 23:20:14 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
20/06/10 23:20:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/10 23:20:14 INFO MemoryStore: MemoryStore cleared
20/06/10 23:20:14 INFO BlockManager: BlockManager stopped
20/06/10 23:20:14 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/10 23:20:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/10 23:20:14 INFO SparkContext: Successfully stopped SparkContext
20/06/10 23:20:14 INFO ShutdownHookManager: Shutdown hook called
20/06/10 23:20:14 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-cbdbcfe7-bc70-4d34-ad8e-5baed8308ae2
My code:
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
import org.apache.spark.sql.SparkSession
object Demo {
def main(args:Array[String]): Unit = {
println("hello big data")
val containerName = "container1"
val storageAccountName = "storageaccount1"
val sas = "saskey"
val url = "wasbs://" + containerName + "#" + storageAccountName + ".blob.core.windows.net/"
var config = "fs.azure.sas." + containerName + "." + storageAccountName + ".blob.core.windows.net"
//Spark session
val spark : SparkSession = SparkSession.builder
.appName("SpartDemo")
.master("local[1]")
.getOrCreate()
//Mount data
dbutils.fs.mount(
source = url,
mountPoint = "/mnt/container1",
extraConfigs = Map(config -> sas))
val parquetFileDF = spark.read.parquet("/mnt/container1/test1.parquet")
parquetFileDF.show()
}
}
My sbt file:
name := "sparkdemo1"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"com.databricks" % "dbutils-api_2.11" % "0.0.3",
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0"
)
Are you running this into a Databricks instance?
If not, that's the problem: dbutils are provided by Databricks execution context.
In that case, as far as I know, you have three options:
Package your application into a jar file and run it using a Databricks job
Use databricks-connect
Try to emulate a mocked dbutils instance outside Databricks as shown here:
com.databricks.dbutils_v1.DBUtilsHolder.dbutils0.set(
new com.databricks.dbutils_v1.DBUtilsV1{
...
}
)
Anyway, I'd say that options 1 and 2 are better than the third one. Also by choosing one of those you don't need to include "dbutils-api_2.11" dependency, as it is provided by Databricks cluster.

I can't debug my program in intellij idea CE

Disconnected from the target VM, address: '127.0.0.1:39989', transport: 'socket' on intellij idea CE. I can't debug my program. Any suggestions?
Connected to the target VM, address: '127.0.0.1:39989', transport: 'socket'
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/12/29 17:29:47 INFO SparkContext: Running Spark version 2.1.2
17/12/29 17:29:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/29 17:29:49 WARN Utils: Your hostname, ashfaq-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
17/12/29 17:29:49 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/12/29 17:29:49 INFO SecurityManager: Changing view acls to: ashfaq
17/12/29 17:29:49 INFO SecurityManager: Changing modify acls to: ashfaq
17/12/29 17:29:49 INFO SecurityManager: Changing view acls groups to:
17/12/29 17:29:49 INFO SecurityManager: Changing modify acls groups to:
17/12/29 17:29:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ashfaq); groups with view permissions: Set(); users with modify permissions: Set(ashfaq); groups with modify permissions: Set()
17/12/29 17:29:51 INFO Utils: Successfully started service 'sparkDriver' on port 46133.
17/12/29 17:29:51 INFO SparkEnv: Registering MapOutputTracker
17/12/29 17:29:51 INFO SparkEnv: Registering BlockManagerMaster
17/12/29 17:29:51 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/12/29 17:29:51 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/12/29 17:29:51 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b3b48105-28be-4781-a395-c7e83cc72e8c
17/12/29 17:29:51 INFO MemoryStore: MemoryStore started with capacity 393.1 MB
17/12/29 17:29:51 INFO SparkEnv: Registering OutputCommitCoordinator
17/12/29 17:29:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/12/29 17:29:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
17/12/29 17:29:53 INFO Executor: Starting executor ID driver on host localhost
17/12/29 17:29:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33583.
17/12/29 17:29:54 INFO NettyBlockTransferService: Server created on 10.0.2.15:33583
17/12/29 17:29:54 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/12/29 17:29:54 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:54 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:33583 with 393.1 MB RAM, BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:54 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:54 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.5 KB, free 392.8 MB)
17/12/29 17:29:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 392.8 MB)
17/12/29 17:29:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:33583 (size: 22.9 KB, free: 393.1 MB)
17/12/29 17:29:59 INFO SparkContext: Created broadcast 0 from textFile at scalaApp.scala:13
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/ashfaq/Desktop/saclaAPP/data/UserPurchaseHistory.csv
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1968)
at org.apache.spark.rdd.RDD.count(RDD.scala:1158)
at ScalaApp$.main(scalaApp.scala:18)
at ScalaApp.main(scalaApp.scala)
17/12/29 17:29:59 INFO SparkContext: Invoking stop() from shutdown hook
17/12/29 17:29:59 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
17/12/29 17:29:59 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.0.2.15:33583 in memory (size: 22.9 KB, free: 393.1 MB)
17/12/29 17:29:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/12/29 17:30:00 INFO MemoryStore: MemoryStore cleared
17/12/29 17:30:00 INFO BlockManager: BlockManager stopped
17/12/29 17:30:00 INFO BlockManagerMaster: BlockManagerMaster stopped
17/12/29 17:30:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/12/29 17:30:00 INFO SparkContext: Successfully stopped SparkContext
17/12/29 17:30:00 INFO ShutdownHookManager: Shutdown hook called
Disconnected from the target VM, address: '127.0.0.1:39989', transport: 'socket'
17/12/29 17:30:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-58667739-7c15-4665-8ede-fde9c3ff1d83
Process finished with exit code 1
It looks like, you are trying to open a file which doesn't exist. The fisrt line of the error message says so:
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/ashfaq/Desktop/saclaAPP/data/UserPurchaseHistory.csv

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$scope()Lscala/xml/TopScope$;

I am running a word count program in spark but i am getting the below error
I have added scala-xml_2.11-1.0.2.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/16 05:14:02 INFO SparkContext: Running Spark version 2.0.2
16/12/16 05:14:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/16 05:14:03 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.59.132 instead (on interface ens33)
16/12/16 05:14:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/12/16 05:14:04 INFO SecurityManager: Changing view acls to: hadoopusr
16/12/16 05:14:04 INFO SecurityManager: Changing modify acls to: hadoopusr
16/12/16 05:14:04 INFO SecurityManager: Changing view acls groups to:
16/12/16 05:14:04 INFO SecurityManager: Changing modify acls groups to:
16/12/16 05:14:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopusr); groups with view permissions: Set(); users with modify permissions: Set(hadoopusr); groups with modify permissions: Set()
16/12/16 05:14:05 INFO Utils: Successfully started service 'sparkDriver' on port 40559.
16/12/16 05:14:05 INFO SparkEnv: Registering MapOutputTracker
16/12/16 05:14:05 INFO SparkEnv: Registering BlockManagerMaster
16/12/16 05:14:05 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-0b830180-ae51-451f-9673-4f98dbaff520
16/12/16 05:14:05 INFO MemoryStore: MemoryStore started with capacity 433.6 MB
16/12/16 05:14:05 INFO SparkEnv: Registering OutputCommitCoordinator
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$scope()Lscala/xml/TopScope$;
at org.apache.spark.ui.jobs.StagePage.<init>(StagePage.scala:44)
at org.apache.spark.ui.jobs.StagesTab.<init>(StagesTab.scala:34)
at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:62)
at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:219)
at org.apache.spark.ui.SparkUI$.createLiveUI(SparkUI.scala:161)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:440)
at LearnScala.WordCount$.main(WordCount.scala:15)
at LearnScala.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
16/12/16 05:14:05 INFO DiskBlockManager: Shutdown hook called
16/12/16 05:14:05 INFO ShutdownHookManager: Shutdown hook called
16/12/16 05:14:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-789e9a76-894f-468b-a39a-cf00da30e4ba/userFiles-3656d5f8-25ba-45c4-b2f6-9f654a049bb1
16/12/16 05:14:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-789e9a76-894f-468b-a39a-cf00da30e4ba
I am using the below versions
build.SBT:
name := "SparkApps"
version := "1.0"
scalaVersion := "2.11.5"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.2"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.2"
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.10
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "2.0.2"
// https://mvnrepository.com/artifact/org.apache.spark/spark-yarn_2.11
libraryDependencies += "org.apache.spark" % "spark-yarn_2.10" % "2.0.2"
Spark version: 2.0.2
I am running a word count program in spark but i am getting the below
error I have added scala-xml_2.11-1.0.2.jar
Later we can see:
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.2"
Choose one ;) Scala 2.10 or Scala 2.11. Change Scala-XML version to 2.10 or Spark to 2.11. From Spark 2.0, Scala 2.11 is recommended.
You can easily add proper Scala versions by adding %% in build.sbt:
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2"
Secondly, in build.sbt there is no information about Scala-XML dependecy - you should add it.
At last, you must add all 3rd party jars to spark-submit via --jars option or build uber jar - see this question

Cannot run spark jobs locally using sbt, but works in IntelliJ

I've written a few simple Spark jobs and some tests for them. I've done everything in IntelliJ and it works great. Now, I'd like to make sure my code builds with sbt. Compiling is fine, but I get strange errors during running and testing.
I am using Scala version 2.11.8 and sbt version 0.13.8
My build.sbt file looks like this:
name := "test"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "javax.mail" % "javax.mail-api" % "1.5.6"
libraryDependencies += "com.sun.mail" % "javax.mail" % "1.5.6"
libraryDependencies += "commons-cli" % "commons-cli" % "1.3.1"
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "3.0.0" % "test"
libraryDependencies += "com.holdenkarau" % "spark-testing-base_2.11" % "2.0.0_0.4.4" % "test" intransitive()
I try to run my code using sbt "run-main com.test.email.processor.bin.Runner" Here is the output:
[info] Loading project definition from /Users/max/workplace/test/project
[info] Set current project to test (in build file:/Users/max/workplace/test/)
[info] Running com.test.email.processor.bin.Runner -j recipientCount -e /Users/max/workplace/data/test/enron_with_categories/*/*.txt
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/23 18:46:55 INFO SparkContext: Running Spark version 2.0.0
16/08/23 18:46:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/23 18:46:55 INFO SecurityManager: Changing view acls to: max
16/08/23 18:46:55 INFO SecurityManager: Changing modify acls to: max
16/08/23 18:46:55 INFO SecurityManager: Changing view acls groups to:
16/08/23 18:46:55 INFO SecurityManager: Changing modify acls groups to:
16/08/23 18:46:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(max); groups with view permissions: Set(); users with modify permissions: Set(max); groups with modify permissions: Set()
16/08/23 18:46:56 INFO Utils: Successfully started service 'sparkDriver' on port 61759.
16/08/23 18:46:56 INFO SparkEnv: Registering MapOutputTracker
16/08/23 18:46:56 INFO SparkEnv: Registering BlockManagerMaster
16/08/23 18:46:56 INFO DiskBlockManager: Created local directory at /private/var/folders/75/4dydy_6110v0gjv7bg265_g40000gn/T/blockmgr-9eb526c0-b7e5-444a-b186-d7f248c5dc62
16/08/23 18:46:56 INFO MemoryStore: MemoryStore started with capacity 408.9 MB
16/08/23 18:46:56 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/23 18:46:56 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/23 18:46:56 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.11:4040
16/08/23 18:46:56 INFO Executor: Starting executor ID driver on host localhost
16/08/23 18:46:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61760.
16/08/23 18:46:57 INFO NettyBlockTransferService: Server created on 192.168.1.11:61760
16/08/23 18:46:57 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.11, 61760)
16/08/23 18:46:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.11:61760 with 408.9 MB RAM, BlockManagerId(driver, 192.168.1.11, 61760)
16/08/23 18:46:57 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.11, 61760)
16/08/23 18:46:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 128.0 KB, free 408.8 MB)
16/08/23 18:46:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.6 KB, free 408.8 MB)
16/08/23 18:46:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.11:61760 (size: 14.6 KB, free: 408.9 MB)
16/08/23 18:46:57 INFO SparkContext: Created broadcast 0 from wholeTextFiles at RecipientCountJob.scala:22
16/08/23 18:46:58 WARN ClosureCleaner: Expected a closure; got com.test.email.processor.util.cleanEmail$
16/08/23 18:46:58 INFO FileInputFormat: Total input paths to process : 1702
16/08/23 18:46:58 INFO FileInputFormat: Total input paths to process : 1702
16/08/23 18:46:58 INFO CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
16/08/23 18:46:58 INFO SparkContext: Starting job: take at RecipientCountJob.scala:35
16/08/23 18:46:58 WARN DAGScheduler: Creating new stage failed due to exception - job: 0
java.lang.ClassNotFoundException: scala.Function0
at sbt.classpath.ClasspathFilter.loadClass(ClassLoaders.scala:63)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at com.twitter.chill.KryoBase$$anonfun$1.apply(KryoBase.scala:41)
at com.twitter.chill.KryoBase$$anonfun$1.apply(KryoBase.scala:41)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.immutable.Range.foreach(Range.scala:166)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.twitter.chill.KryoBase.<init>(KryoBase.scala:41)
at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:57)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:86)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:274)
at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:259)
at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:175)
at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:182)
at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:178)
at org.apache.spark.shuffle.sort.SortShuffleManager$.canUseSerializedShuffle(SortShuffleManager.scala:187)
at org.apache.spark.shuffle.sort.SortShuffleManager.registerShuffle(SortShuffleManager.scala:99)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:90)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:91)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:235)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:233)
at org.apache.spark.scheduler.DAGScheduler.visit$2(DAGScheduler.scala:418)
at org.apache.spark.scheduler.DAGScheduler.getAncestorShuffleDependencies(DAGScheduler.scala:433)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:288)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:394)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:391)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:391)
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:403)
at org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:304)
at org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:339)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:849)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1626)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/08/23 18:46:58 INFO DAGScheduler: Job 0 failed: take at RecipientCountJob.scala:35, took 0.076653 s
[error] (run-main-0) java.lang.ClassNotFoundException: scala.Function0
java.lang.ClassNotFoundException: scala.Function0
[trace] Stack trace suppressed: run last compile:runMain for the full output.
16/08/23 18:46:58 ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:175)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1229)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:172)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:67)
16/08/23 18:46:58 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:67)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:66)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:66)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:65)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1229)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:64)
java.lang.RuntimeException: Nonzero exit code: 1
It would appear you are missing your scala-library as scala.Function0 comes from the standard Scala lib.
You could try adding the scala-lib in certain scopes
libraryDependencies += "org.scala-lang" % "scala-library" % scalaVersion.value
But it seems like the scala-lib is not being added to the classpath of your run.
Might want to also add something like so the same classpath used to compile is used to run the code in SBT.
fullClasspath in run := (fullClasspath in Compile).value
Apparently, Spark cannot be run via sbt. I ended up packing the entire job into a jar using the assembly plugin and running it with java.