ERROR SparkContext: Error initializing SparkContext locally - scala

I'm new to everything related to java/scala/maven/sbt/spark, so please bear with me.
Managed to get everything up and running, or at least so it seems when running the spark-shell locally. The SparkContext gets initialized properly and I can create RDDs.
However, when I call spark-submit locally, I get a SparkContext error.
%SPARK_HOME%\bin\spark-submit --master local --class example.bd.MyApp target\scala-2.12\sparkapp_2.12-1.5.5.jar
This is my code.
package example.bd
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object MyApp {
def main(args : Array[String]) {
val conf = new SparkConf().setAppName("My first Spark application")
val sc = new SparkContext(conf)
println("hi everyone")
}
}
These are my error logs:
21/10/24 18:38:45 WARN Shell: Did not find winutils.exe: {}
java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:689)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814)
at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791)
at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207)
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala)
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516)
... 21 more
21/10/24 18:38:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/24 18:38:45 INFO SparkContext: Running Spark version 3.1.2
21/10/24 18:38:45 INFO ResourceUtils: ==============================================================
21/10/24 18:38:45 INFO ResourceUtils: No custom resources configured for spark.driver.
21/10/24 18:38:45 INFO ResourceUtils: ==============================================================
21/10/24 18:38:45 INFO SparkContext: Submitted application: My first Spark application
21/10/24 18:38:45 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
21/10/24 18:38:45 INFO ResourceProfile: Limiting resource is cpu
21/10/24 18:38:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
21/10/24 18:38:45 INFO SecurityManager: Changing view acls to: User
21/10/24 18:38:45 INFO SecurityManager: Changing modify acls to: User
21/10/24 18:38:45 INFO SecurityManager: Changing view acls groups to:
21/10/24 18:38:45 INFO SecurityManager: Changing modify acls groups to:
21/10/24 18:38:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(User); groups with view permissions: Set(); users with modify permissions: Set(User); groups with modify permissions: Set()
21/10/24 18:38:46 INFO Utils: Successfully started service 'sparkDriver' on port 56899.
21/10/24 18:38:46 INFO SparkEnv: Registering MapOutputTracker
21/10/24 18:38:46 INFO SparkEnv: Registering BlockManagerMaster
21/10/24 18:38:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/10/24 18:38:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/10/24 18:38:46 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/10/24 18:38:46 INFO DiskBlockManager: Created local directory at C:\Users\User\AppData\Local\Temp\blockmgr-8df572ec-4206-48ae-b517-bd56242fca4c
21/10/24 18:38:46 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
21/10/24 18:38:46 INFO SparkEnv: Registering OutputCommitCoordinator
21/10/24 18:38:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/10/24 18:38:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://LAPTOP-RKJJV5E4:4040
21/10/24 18:38:46 INFO SparkContext: Added JAR file:/C:/[path]/sparkapp/target/scala-2.12/sparkapp_2.12-1.5.5.jar at spark://LAPTOP-RKJJV5E4:56899/jars/sparkapp_2.12-1.5.5.jar with timestamp 1635093525663
21/10/24 18:38:46 INFO Executor: Starting executor ID driver on host LAPTOP-RKJJV5E4
21/10/24 18:38:46 INFO Executor: Fetching spark://LAPTOP-RKJJV5E4:56899/jars/sparkapp_2.12-1.5.5.jar with timestamp 1635093525663
21/10/24 18:38:46 INFO TransportClientFactory: Successfully created connection to LAPTOP-RKJJV5E4/192.168.0.175:56899 after 30 ms (0 ms spent in bootstraps)
21/10/24 18:38:46 INFO Utils: Fetching spark://LAPTOP-RKJJV5E4:56899/jars/sparkapp_2.12-1.5.5.jar to C:\Users\User\AppData\Local\Temp\spark-9c13f31f-92a7-4fc5-a87d-a0ae6410e6d2\userFiles-5ac95e31-3656-4a4d-a205-e0750c041bcb\fetchFileTemp7994667570718611461.tmp
21/10/24 18:38:46 ERROR SparkContext: Error initializing SparkContext.
java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:736)
at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:271)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1120)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1106)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:563)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:953)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:945)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:945)
at org.apache.spark.executor.Executor.<init>(Executor.scala:247)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
at example.bd.MyApp$.main(MyApp.scala:10)
at example.bd.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:689)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814)
at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791)
at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207)
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala)
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
... 6 more
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516)
... 21 more
21/10/24 18:38:46 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-RKJJV5E4:4040
21/10/24 18:38:46 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:173)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:881)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2370)
at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2069)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1419)
at org.apache.spark.SparkContext.stop(SparkContext.scala:2069)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:671)
at example.bd.MyApp$.main(MyApp.scala:10)
at example.bd.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/10/24 18:38:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/10/24 18:38:46 INFO MemoryStore: MemoryStore cleared
21/10/24 18:38:46 INFO BlockManager: BlockManager stopped
21/10/24 18:38:46 INFO BlockManagerMaster: BlockManagerMaster stopped
21/10/24 18:38:46 WARN MetricsSystem: Stopping a MetricsSystem that is not running
21/10/24 18:38:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/10/24 18:38:46 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:736)
at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:271)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1120)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1106)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:563)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:953)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:945)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:945)
at org.apache.spark.executor.Executor.<init>(Executor.scala:247)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
at example.bd.MyApp$.main(MyApp.scala:10)
at example.bd.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:689)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814)
at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791)
at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207)
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala)
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
... 6 more
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516)
... 21 more
21/10/24 18:38:47 ERROR Utils: Uncaught exception in thread shutdown-hook-0
java.lang.NullPointerException
at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:332)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
at org.apache.spark.executor.Executor.stop(Executor.scala:332)
at org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:76)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/10/24 18:38:47 INFO ShutdownHookManager: Shutdown hook called
21/10/24 18:38:47 INFO ShutdownHookManager: Deleting directory C:\Users\User\AppData\Local\Temp\spark-ac076276-e6ab-4cfb-a153-56ad35c46d56
21/10/24 18:38:47 INFO ShutdownHookManager: Deleting directory C:\Users\User\AppData\Local\Temp\spark-9c13f31f-92a7-4fc5-a87d-a0ae6410e6d2
As far as I can tell the errors are directly related to Hadoop not being set up, but that shouldn't be an issue given that I am trying to run it locally, no?
Any help will be greatly appreciated.

Related

Failing to execute spark-submit command on a sample word count project

I am doing a tutorial on Pluralsight for Apache Spark which is a simple word counter. I am on Windows 11 and I have IntelliJ IDEA 2022.3.1 (Ultimate Edition). Additionally, on my machine I have JKD8, Apache Spark 3.3.1 pre built for Hadoop 3.3 and later, and Hadoop 3.3.4. The code is written in Scala with SBT as the build tooland I've included the code below. After packaging the file with sbt package I run the command
spark-submit --class "main.WordCount" --master "local[*]" "C:\Users\user\Documents\Projects\WordCount\target\scala-2.11\word-count_2.11-0.1.jar"
I am receiving an exception
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat; (Full log below)
I have my dev tools (Java, Spark, Hadoop, etc) under C:\DevTools\TOOL and the Windows Environment variables are set as follows:
JAVA_HOME -> C:\DevTools\TOOL\Java
SPARK_HOME -> C:\DevTools\TOOL\Spark
HADOOP_HOME -> C:\DevTools\TOOL\Hadoop
PATH -> %JAVA_HOME%\bin; %SPARK_HOME%\bin; %HADOOP_HOME%\bin
Lastly, I've downloaded various winutils.exe and and hadoop.dll and I've put them in the Spark bin folder and the Hadoop bin folder but nothing seemingly works. Does anyone have any suggestions as to how I can get this to execute successfully?
build.sbt
name := "Word Count"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion
)
WordCount.scala
package main
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main (args: Array[String]): Unit = {
val configuration = new SparkConf().setAppName("Word Counter")
val sparkContext = new SparkContext(configuration)
val textFile = sparkContext.textFile("file:///DevTools/TOOL/Spark")
val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
val countPrep = tokenizedFileData.map(word=>(word, 1))
val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
val storedCounts = counts.sortBy(kvPair=>kvPair._2, false)
storedCounts.saveAsTextFile("file:///DevTools/TOOL/Spark/output")
}
}
Full Log
PS C:\Users\user\Documents\Projects\WordCount> spark-submit --class "main.WordCount" --master "local[*]" "C:\Users\user\Documents\Projects\WordCount\target\scala-2.11\word-count_2.11-0.1.jar"
23/01/26 17:00:08 INFO SparkContext: Running Spark version 3.3.1
23/01/26 17:00:08 INFO ResourceUtils: ==============================================================
23/01/26 17:00:08 INFO ResourceUtils: No custom resources configured for spark.driver.
23/01/26 17:00:08 INFO ResourceUtils: ==============================================================
23/01/26 17:00:08 INFO SparkContext: Submitted application: Word Counter
23/01/26 17:00:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/01/26 17:00:08 INFO ResourceProfile: Limiting resource is cpu
23/01/26 17:00:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/01/26 17:00:08 INFO SecurityManager: Changing view acls to: user
23/01/26 17:00:08 INFO SecurityManager: Changing modify acls to: user
23/01/26 17:00:08 INFO SecurityManager: Changing view acls groups to:
23/01/26 17:00:08 INFO SecurityManager: Changing modify acls groups to:
23/01/26 17:00:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set()
23/01/26 17:00:09 INFO Utils: Successfully started service 'sparkDriver' on port 50249.
23/01/26 17:00:09 INFO SparkEnv: Registering MapOutputTracker
23/01/26 17:00:09 INFO SparkEnv: Registering BlockManagerMaster
23/01/26 17:00:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/01/26 17:00:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/01/26 17:00:09 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/01/26 17:00:09 INFO DiskBlockManager: Created local directory at C:\Users\user\AppData\Local\Temp\blockmgr-c7d05098-5b05-4121-b1b6-2e7445fc9240
23/01/26 17:00:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
23/01/26 17:00:09 INFO SparkEnv: Registering OutputCommitCoordinator
23/01/26 17:00:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/01/26 17:00:10 INFO SparkContext: Added JAR file:/C:/Users/user/Documents/Projects/WordCount/target/scala-2.11/word-count_2.11-0.1.jar at spark://localhost:50249/jars/word-count_2.11-0.1.jar with timestamp 1674770408345
23/01/26 17:00:10 INFO Executor: Starting executor ID driver on host localhost
23/01/26 17:00:10 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/01/26 17:00:10 INFO Executor: Fetching spark://localhost:50249/jars/word-count_2.11-0.1.jar with timestamp 1674770408345
23/01/26 17:00:10 INFO TransportClientFactory: Successfully created connection to localhost/192.168.1.221:50249 after 58 ms (0 ms spent in bootstraps)
23/01/26 17:00:10 INFO Utils: Fetching spark://localhost:50249/jars/word-count_2.11-0.1.jar to C:\Users\user\AppData\Local\Temp\spark-d7979eef-eac8-4a89-8ee0-246a821703d6\userFiles-8222f8d5-3999-47a7-b048-a9c37e66150a\fetchFileTemp8156211875497724521.tmp
23/01/26 17:00:11 INFO Executor: Adding file:/C:/Users/user/AppData/Local/Temp/spark-d7979eef-eac8-4a89-8ee0-246a821703d6/userFiles-8222f8d5-3999-47a7-b048-a9c37e66150a/word-count_2.11-0.1.jar to class loader
23/01/26 17:00:11 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50306.
23/01/26 17:00:11 INFO NettyBlockTransferService: Server created on localhost:50306
23/01/26 17:00:11 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/01/26 17:00:11 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50306 with 366.3 MiB RAM, BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 358.0 KiB, free 366.0 MiB)
23/01/26 17:00:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.3 KiB, free 365.9 MiB)
23/01/26 17:00:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:50306 (size: 32.3 KiB, free: 366.3 MiB)
23/01/26 17:00:12 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:13
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:608)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:934)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:848)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:816)
at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2199)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2179)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions.$anonfun$reduceByKey$4(PairRDDFunctions.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:323)
at main.WordCount$.main(WordCount.scala:16)
at main.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/01/26 17:00:12 INFO SparkContext: Invoking stop() from shutdown hook
23/01/26 17:00:12 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/01/26 17:00:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/01/26 17:00:12 INFO MemoryStore: MemoryStore cleared
23/01/26 17:00:12 INFO BlockManager: BlockManager stopped
23/01/26 17:00:12 INFO BlockManagerMaster: BlockManagerMaster stopped
23/01/26 17:00:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/01/26 17:00:12 INFO SparkContext: Successfully stopped SparkContext
23/01/26 17:00:12 INFO ShutdownHookManager: Shutdown hook called
23/01/26 17:00:12 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-d7979eef-eac8-4a89-8ee0-246a821703d6
23/01/26 17:00:12 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-26625e11-a7f1-41f7-b2b3-29f97ea9e75a

Spark MongoDB connector V10 issue

I am having a problem connecting to mongodb using spark structure streaming,
Here is my python code,
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
# from lib.logger import Log4j
if __name__ == "__main__":
spark = (SparkSession
.builder
.appName("Streaming from mongo db")
.master("local[3]")
.config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_10.0.5:3.3.1')
.config("spark.streaming.stopGracefullyOnShutdown", "true")
# .config("spark.sql.shuffle.partitions", 3)
.getOrCreate())
read_from_mongo = (spark
.readStream
.format("mongodb")
.option("uri", "mongodb://admin:admin#localhost:27017")
.option("database", "first_db")
.option("collection", "first_collection")
.load()
.writeStream
.format("console")
.trigger(continuous="1 second")
.outputMode("append"))
y = read_from_mongo.start()
I am running the script using spark-submit file_name.py
the output I'm getting is the following :
22/11/16 16:04:21 INFO SparkContext: Running Spark version 3.3.1
22/11/16 16:04:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/16 16:04:21 INFO ResourceUtils: ==============================================================
22/11/16 16:04:21 INFO ResourceUtils: No custom resources configured for spark.driver.
22/11/16 16:04:21 INFO ResourceUtils: ==============================================================
22/11/16 16:04:21 INFO SparkContext: Submitted application: Streaming from mongo db
22/11/16 16:04:21 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/11/16 16:04:21 INFO ResourceProfile: Limiting resource is cpu
22/11/16 16:04:21 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/11/16 16:04:21 INFO SecurityManager: Changing view acls to: mustaphaaminedebbih
22/11/16 16:04:21 INFO SecurityManager: Changing modify acls to: mustaphaaminedebbih
22/11/16 16:04:21 INFO SecurityManager: Changing view acls groups to:
22/11/16 16:04:21 INFO SecurityManager: Changing modify acls groups to:
22/11/16 16:04:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mustaphaaminedebbih); groups with view permissions: Set(); users with modify permissions: Set(mustaphaaminedebbih); groups with modify permissions: Set()
22/11/16 16:04:21 INFO Utils: Successfully started service 'sparkDriver' on port 52063.
22/11/16 16:04:21 INFO SparkEnv: Registering MapOutputTracker
22/11/16 16:04:21 INFO SparkEnv: Registering BlockManagerMaster
22/11/16 16:04:21 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/11/16 16:04:21 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/11/16 16:04:21 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/11/16 16:04:21 INFO DiskBlockManager: Created local directory at /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/blockmgr-53c25f51-35a7-44ab-bc1d-bb16a7ae050c
22/11/16 16:04:22 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
22/11/16 16:04:22 INFO SparkEnv: Registering OutputCommitCoordinator
22/11/16 16:04:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/11/16 16:04:22 INFO Executor: Starting executor ID driver on host 192.168.9.44
22/11/16 16:04:22 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
22/11/16 16:04:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52065.
22/11/16 16:04:22 INFO NettyBlockTransferService: Server created on 192.168.9.44:52065
22/11/16 16:04:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/11/16 16:04:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.9.44:52065 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/11/16 16:04:22 INFO SharedState: Warehouse path is 'file:/Users/mustaphaaminedebbih/Desktop/Scrambling/Spark%20Streaming/Streaming%20From%20Mongodb/spark-warehouse'.
Traceback (most recent call last):
File "/Users/mustaphaaminedebbih/Desktop/Scrambling/Spark Streaming/Streaming From Mongodb/test.py", line 19, in
read_from_mongo = (spark
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/streaming.py", line 469, in load
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in call
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
java.lang.ClassNotFoundException:
Failed to find data source: mongodb. Please find packages at
https://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:587)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
at org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:157)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:144)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: mongodb.DefaultSource
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:435)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:661)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:661)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:661)
14 more
22/11/16 16:04:23 INFO SparkContext: Invoking stop() from shutdown hook
22/11/16 16:04:23 INFO SparkUI: Stopped Spark web UI at http://192.168.9.44:4040
22/11/16 16:04:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/11/16 16:04:23 INFO MemoryStore: MemoryStore cleared
22/11/16 16:04:23 INFO BlockManager: BlockManager stopped
22/11/16 16:04:23 INFO BlockManagerMaster: BlockManagerMaster stopped
22/11/16 16:04:23 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/11/16 16:04:23 INFO SparkContext: Successfully stopped SparkContext
22/11/16 16:04:23 INFO ShutdownHookManager: Shutdown hook called
22/11/16 16:04:23 INFO ShutdownHookManager: Deleting directory /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/spark-2ac1c680-c1d8-46e1-bfae-40c82cee015f
22/11/16 16:04:23 INFO ShutdownHookManager: Deleting directory /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/spark-1816dfc9-2a95-4361-88d1-025c999f1514
22/11/16 16:04:23 INFO ShutdownHookManager: Deleting directory /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/spark-1816dfc9-2a95-4361-88d1-025c999f1514/pyspark-095a2954-0010-4da3-b658-2483dcc79afc
I tried almost every possible solution, but nothing worked

java.lang.NoClassDefFoundError: scala/Product$class Error while reading Redshift Table from EMR

Here is the code I am running on the EMR
import pyspark
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark import SQLContext, SparkContext, SparkConf
spark = SparkSession.builder.getOrCreate()
sql_context = SQLContext(spark)
sc = spark.sparkContext
url = "jdbc:redshift://redshift-cluster-endpoint.amazonaws.com::5439/db-name?user=<USER>&password=<PWD>"
df = sql_context.read \
.format("com.databricks.spark.redshift") \
.option("url", url) \
.option("dbtable", "schema_name.table_name") \
.option("tempdir", "s3://temp-bucket/temp_data_pyspark/") \
.option("forward_spark_s3_credentials", "true") \
.load()
print(df.head(10))
This is my spark-submit
spark-submit \
--jars s3://aws-emr-resources-bucket-us-east-1/RedshiftJDBC41-1.2.12.1017.jar,s3://aws-emr-resources-bucket-us-east-1/minimal-json-0.9.4.jar,s3://aws-emr-resources-bucket-us-east-1/spark-avro_2.11-3.0.0.jar,s3://aws-emr-resources-bucket-us-east-1/spark-redshift_2.10-2.0.0.jar \
--packages com.databricks:spark-redshift_2.11:2.0.1 \
python-script.py
The error I am facing is :
Traceback (most recent call last):
File "/home/hadoop/python-script.py", line 18, in <module>
.option("forward_spark_s3_credentials", "true") \
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 210, in load
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o67.load.
: java.lang.NoClassDefFoundError: scala/Product$class
at com.databricks.spark.redshift.Parameters$MergedParameters.<init>(Parameters.scala:78)
at com.databricks.spark.redshift.Parameters$.mergeParameters(Parameters.scala:72)
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 20 more
22/09/06 08:29:38 INFO SparkContext: Invoking stop() from shutdown hook
22/09/06 08:29:38 INFO AbstractConnector: Stopped Spark#2edd2970{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
22/09/06 08:29:38 INFO SparkUI: Stopped Spark web UI at http://ip-10-50-105-28.ec2.internal:4040
22/09/06 08:29:38 INFO YarnClientSchedulerBackend: Interrupting monitor thread
22/09/06 08:29:38 INFO YarnClientSchedulerBackend: Shutting down all executors
22/09/06 08:29:38 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/09/06 08:29:38 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped
22/09/06 08:29:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/09/06 08:29:38 INFO MemoryStore: MemoryStore cleared
22/09/06 08:29:38 INFO BlockManager: BlockManager stopped
22/09/06 08:29:38 INFO BlockManagerMaster: BlockManagerMaster stopped
22/09/06 08:29:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/09/06 08:29:38 INFO SparkContext: Successfully stopped SparkContext
22/09/06 08:29:38 INFO ShutdownHookManager: Shutdown hook called
22/09/06 08:29:38 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fefa4b92-1737-4087-ba47-e8a81a3aa2e0
22/09/06 08:29:38 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-a58798f3-021f-4a8a-88f6-5bd54d57b428/pyspark-442297be-93ef-494d-af78-57b88db7395e
22/09/06 08:29:38 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-a58798f3-021f-4a8a-88f6-5bd54d57b428
Upon searching this error, I found here that I must be using different Scala version binaries. Any help in identifying the compatible versions of the above jars is appreciated. Resources/Links would be most beneficial.

Connection ERROR while writing Dataframe (Pyspark 3.x on EMR 6.x) to RDS (MySQL)

I get "connection refused error" when I try to write the results of a Dataframe to an RDS (MySQL). I am using PySpark 3 on EMR cluster v6.x (1 master node, 1 slave node). The table does not exist yet. But the data base exist.
spark-submit --jars s3://{some s3 folder}/mysql-connector-java-8.0.25.jar s3://{some s3 folder}/pyspark_script.py
The part of the script that writes to mysql is here (after testing, its the only part of the script that delivers error/is not working): * I have changed the name of my db, user, and password here below
df.write\
.mode("overwrite")\
.format("jdbc")\
.option("url","jdbc:mysql://localhost:3306/{my database name}?useSSL=false")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.option("dbtable","mydb_table")\
.option("user","myuser")\
.option("password","mypassword")\
.save()
This is the error I get: It is about connection refused.
I have already given the EMR Role access RDS and its data!
Traceback (most recent call last):
File "/mnt/tmp/spark-93919f38-ea4d-44d6-be7d-0416be972753/pyspark_script.py", line 57, in <module>
.option("password","assignment")\
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o163.save.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:833)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:453)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:62)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:48)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:89)
at com.mysql.cj.NativeSession.connect(NativeSession.java:144)
at com.mysql.cj.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:953)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:823)
... 45 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.mysql.cj.protocol.StandardSocketFactory.connect(StandardSocketFactory.java:155)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:63)
... 48 more
21/12/19 11:40:04 INFO SparkContext: Invoking stop() from shutdown hook
21/12/19 11:40:04 INFO AbstractConnector: Stopped Spark#74d96709{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
21/12/19 11:40:04 INFO SparkUI: Stopped Spark web UI at http://{ip}.eu-central-1.compute.internal:4040
21/12/19 11:40:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/12/19 11:40:04 INFO MemoryStore: MemoryStore cleared
21/12/19 11:40:04 INFO BlockManager: BlockManager stopped
21/12/19 11:40:04 INFO BlockManagerMaster: BlockManagerMaster stopped
21/12/19 11:40:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/12/19 11:40:04 INFO SparkContext: Successfully stopped SparkContext
21/12/19 11:40:04 INFO ShutdownHookManager: Shutdown hook called
21/12/19 11:40:04 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fd1b8e7c-7b4c-424d-a451-743a6e075fbd
21/12/19 11:40:04 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-93919f38-ea4d-44d6-be7d-0416be972753
21/12/19 11:40:04 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fd1b8e7c-7b4c-424d-a451-743a6e075fbd/pyspark-40fbaaf5-2e34-44ba-875f-88308084546d
Try this thread: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
EMR with localhost RDS? seems odd, did you missed set the IP correctly ?

Spark Submit not able to pick classpath from jar

i have created a spark job which will get data from one cassandra table and insert into another table, i am using gradle to build the jar file some how i am able to create a jar with all dependencies , i am using the below command to trigger spark job
spark-submit --class DataMigration OrderAnalytics.jar
All required jars are present inside OrderAnalytics.jar i.e lib/** still i am getting NoClassDefFoundError as below
17/08/19 22:56:38 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
And META-INF looks like this
Manifest-Version: 1.0
Main-Class: DataMigration
Class-Path: lib/spark-sql_2.11-2.2.0.jar lib/spark-cassandra-connector
_2.11-2.0.3.jar lib/univocity-parsers-2.2.1.jar lib/spark-sketch_2.11
-2.2.0.jar lib/spark-core_2.11-2.2.0.jar lib/spark-catalyst_2.11-2.2.
0.jar lib/spark-tags_2.11-2.2.0.jar lib/parquet-column-1.8.2.jar lib/
parquet-hadoop-1.8.2.jar lib/jackson-databind-2.6.5.jar lib/xbean-asm
5-shaded-4.4.jar lib/unused-1.0.0.jar lib/jsr166e-1.1.0.jar lib/commo
ns-beanutils-1.9.3.jar lib/joda-time-2.3.jar lib/joda-convert-1.2.jar
lib/scala-reflect-2.11.8.jar lib/avro-1.7.7.jar lib/avro-mapred-1.7.
7-hadoop2.jar lib/chill_2.11-0.8.0.jar lib/chill-java-0.8.0.jar lib/h
adoop-client-2.6.5.jar lib/spark-launcher_2.11-2.2.0.jar lib/spark-ne
twork-common_2.11-2.2.0.jar lib/spark-network-shuffle_2.11-2.2.0.jar
lib/spark-unsafe_2.11-2.2.0.jar lib/jets3t-0.9.3.jar lib/curator-reci
pes-2.6.0.jar lib/javax.servlet-api-3.1.0.jar lib/commons-lang3-3.5.j
ar lib/commons-math3-3.4.1.jar lib/jsr305-1.3.9.jar lib/jul-to-slf4j-
1.7.16.jar lib/jcl-over-slf4j-1.7.16.jar lib/log4j-1.2.17.jar lib/slf
4j-log4j12-1.7.16.jar lib/compress-lzf-1.0.3.jar lib/snappy-java-1.1.
2.6.jar lib/lz4-1.3.0.jar lib/RoaringBitmap-0.5.11.jar lib/json4s-jac
kson_2.11-3.2.11.jar lib/jersey-client-2.22.2.jar lib/jersey-common-2
.22.2.jar lib/jersey-server-2.22.2.jar lib/jersey-container-servlet-2
.22.2.jar lib/jersey-container-servlet-core-2.22.2.jar lib/netty-3.9.
9.Final.jar lib/stream-2.7.0.jar lib/metrics-core-3.1.2.jar lib/metri
cs-jvm-3.1.2.jar lib/metrics-json-3.1.2.jar lib/metrics-graphite-3.1.
2.jar lib/jackson-module-scala_2.11-2.6.5.jar lib/ivy-2.4.0.jar lib/o
ro-2.0.8.jar lib/pyrolite-4.13.jar lib/py4j-0.10.4.jar lib/commons-cr
ypto-1.0.0.jar lib/janino-3.0.0.jar lib/commons-compiler-3.0.0.jar li
b/antlr4-runtime-4.5.3.jar lib/commons-codec-1.10.jar lib/parquet-com
mon-1.8.2.jar lib/parquet-encoding-1.8.2.jar lib/parquet-format-2.3.1
.jar lib/parquet-jackson-1.8.2.jar lib/jackson-core-2.6.5.jar lib/com
mons-collections-3.2.2.jar lib/commons-compress-1.4.1.jar lib/avro-ip
c-1.7.7.jar lib/avro-ipc-1.7.7-tests.jar lib/kryo-shaded-3.0.3.jar li
b/hadoop-common-2.6.5.jar lib/hadoop-hdfs-2.6.5.jar lib/hadoop-mapred
uce-client-app-2.6.5.jar lib/hadoop-yarn-api-2.6.5.jar lib/hadoop-map
reduce-client-core-2.6.5.jar lib/hadoop-mapreduce-client-jobclient-2.
6.5.jar lib/hadoop-annotations-2.6.5.jar lib/leveldbjni-all-1.8.jar l
ib/httpcore-4.3.3.jar lib/httpclient-4.3.6.jar lib/activation-1.1.1.j
ar lib/mx4j-3.0.2.jar lib/mail-1.4.7.jar lib/bcprov-jdk15on-1.51.jar
lib/java-xmlbuilder-1.0.jar lib/curator-framework-2.6.0.jar lib/zooke
eper-3.4.6.jar lib/guava-16.0.1.jar lib/json4s-core_2.11-3.2.11.jar l
ib/javax.ws.rs-api-2.0.1.jar lib/hk2-api-2.4.0-b34.jar lib/javax.inje
ct-2.4.0-b34.jar lib/hk2-locator-2.4.0-b34.jar lib/javax.annotation-a
pi-1.2.jar lib/jersey-guava-2.22.2.jar lib/osgi-resource-locator-1.0.
1.jar lib/jersey-media-jaxb-2.22.2.jar lib/validation-api-1.1.0.Final
.jar lib/jackson-module-paranamer-2.6.5.jar lib/xz-1.0.jar lib/minlog
-1.3.0.jar lib/objenesis-2.1.jar lib/commons-cli-1.2.jar lib/xmlenc-0
.52.jar lib/commons-httpclient-3.1.jar lib/commons-io-2.4.jar lib/com
mons-lang-2.6.jar lib/commons-configuration-1.6.jar lib/protobuf-java
-2.5.0.jar lib/gson-2.2.4.jar lib/hadoop-auth-2.6.5.jar lib/curator-c
lient-2.6.0.jar lib/htrace-core-3.0.4.jar lib/jetty-util-6.1.26.jar l
ib/xercesImpl-2.9.1.jar lib/hadoop-mapreduce-client-common-2.6.5.jar
lib/hadoop-mapreduce-client-shuffle-2.6.5.jar lib/hadoop-yarn-common-
2.6.5.jar lib/base64-2.3.8.jar lib/json4s-ast_2.11-3.2.11.jar lib/sca
lap-2.11.0.jar lib/hk2-utils-2.4.0-b34.jar lib/aopalliance-repackaged
-2.4.0-b34.jar lib/javassist-3.18.1-GA.jar lib/commons-digester-1.8.j
ar lib/commons-beanutils-core-1.8.0.jar lib/apacheds-kerberos-codec-2
.0.0-M15.jar lib/xml-apis-1.3.04.jar lib/hadoop-yarn-client-2.6.5.jar
lib/hadoop-yarn-server-common-2.6.5.jar lib/hadoop-yarn-server-nodem
anager-2.6.5.jar lib/jaxb-api-2.2.2.jar lib/jackson-jaxrs-1.9.13.jar
lib/jackson-xc-1.9.13.jar lib/guice-3.0.jar lib/scala-compiler-2.11.0
.jar lib/javax.inject-1.jar lib/jline-0.9.94.jar lib/apacheds-i18n-2.
0.0-M15.jar lib/api-asn1-api-1.0.0-M20.jar lib/api-util-1.0.0-M20.jar
lib/jettison-1.1.jar lib/stax-api-1.0-2.jar lib/aopalliance-1.0.jar
lib/cglib-2.2.1-v20090111.jar lib/scala-xml_2.11-1.0.1.jar lib/scala-
parser-combinators_2.11-1.0.1.jar lib/scala-library-2.11.8.jar lib/sl
f4j-api-1.7.16.jar lib/netty-all-4.0.43.Final.jar lib/jackson-core-as
l-1.9.13.jar lib/jackson-mapper-asl-1.9.13.jar lib/jackson-annotation
s-2.6.5.jar lib/commons-net-3.1.jar lib/paranamer-2.6.jar
UPDATE
As few of the comments and answer by Allison Berman suggested i have tried as below
C:\Dev-Tra\OrderAnalytics\build\libs>spark-submit --jars OrderAnalytics.jar \ --class example.DataMigration
Error: Cannot load main class from JAR file:/C:/
Run with --help for usage help or --verbose for debug output
C:\Dev-Tra\OrderAnalytics\build\libs>spark-submit --jars OrderAnalytics.jar --class example.DataMigration
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
at org.apache.spark.launcher.Main.main(Main.java:86)
but according to Spark Documentation it should be as below in which it's able to start job but not able to get all dependent jars
C:\Dev-Tra\OrderAnalytics\build\libs>spark-submit --class example.DataMigration OrderAnalytics.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/08/21 21:59:36 INFO SparkContext: Running Spark version 2.2.0
17/08/21 21:59:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/21 21:59:36 INFO SparkContext: Submitted application: DataMigration
17/08/21 21:59:36 INFO SecurityManager: Changing view acls to: ram
17/08/21 21:59:36 INFO SecurityManager: Changing modify acls to: ram
17/08/21 21:59:36 INFO SecurityManager: Changing view acls groups to:
17/08/21 21:59:36 INFO SecurityManager: Changing modify acls groups to:
17/08/21 21:59:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ram); groups with vi
ew permissions: Set(); users with modify permissions: Set(ram); groups with modify permissions: Set()
17/08/21 21:59:37 INFO Utils: Successfully started service 'sparkDriver' on port 62239.
17/08/21 21:59:37 INFO SparkEnv: Registering MapOutputTracker
17/08/21 21:59:37 INFO SparkEnv: Registering BlockManagerMaster
17/08/21 21:59:37 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/08/21 21:59:37 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/08/21 21:59:37 INFO DiskBlockManager: Created local directory at C:\Users\ram\AppData\Local\Temp\blockmgr-38ef35e6-219e-450c-b7da-c8075464a232
17/08/21 21:59:37 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/08/21 21:59:37 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/21 21:59:37 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/21 21:59:37 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.101:4040
17/08/21 21:59:38 INFO SparkContext: Added JAR file:/C:/Dev-Tra/OrderAnalytics/build/libs/OrderAnalytics.jar at spark://192.168.1.101:62239/jars/OrderAnalytics
.jar with timestamp 1503332978023
17/08/21 21:59:38 INFO Executor: Starting executor ID driver on host localhost
17/08/21 21:59:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62248.
17/08/21 21:59:38 INFO NettyBlockTransferService: Server created on 192.168.1.101:62248
17/08/21 21:59:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/08/21 21:59:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.101, 62248, None)
17/08/21 21:59:38 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:62248 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.101, 62248
, None)
17/08/21 21:59:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.101, 62248, None)
17/08/21 21:59:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.101, 62248, None)
17/08/21 21:59:40 INFO Native: Could not load JNR C Library, native system calls through this library will not be available (set this logger level to DEBUG to
see the full stack trace).
17/08/21 21:59:40 INFO ClockFactory: Using java.lang.System clock to generate timestamps.
17/08/21 21:59:41 WARN NettyUtil: Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
17/08/21 21:59:41 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
17/08/21 21:59:41 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
17/08/21 21:59:42 INFO SparkContext: Starting job: runJob at RDDFunctions.scala:36
17/08/21 21:59:42 INFO DAGScheduler: Got job 0 (runJob at RDDFunctions.scala:36) with 4 output partitions
17/08/21 21:59:42 INFO DAGScheduler: Final stage: ResultStage 0 (runJob at RDDFunctions.scala:36)
17/08/21 21:59:42 INFO DAGScheduler: Parents of final stage: List()
17/08/21 21:59:42 INFO DAGScheduler: Missing parents: List()
17/08/21 21:59:42 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at DataMigration.scala:18), which has no missing parents
17/08/21 21:59:42 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 12.3 KB, free 366.3 MB)
17/08/21 21:59:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.8 KB, free 366.3 MB)
17/08/21 21:59:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:62248 (size: 5.8 KB, free: 366.3 MB)
17/08/21 21:59:42 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/08/21 21:59:42 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at DataMigration.scala:18) (first 15 tasks are f
or partitions Vector(0, 1, 2, 3))
17/08/21 21:59:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/08/21 21:59:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, NODE_LOCAL, 17002 bytes)
17/08/21 21:59:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/08/21 21:59:42 INFO Executor: Fetching spark://192.168.1.101:62239/jars/OrderAnalytics.jar with timestamp 1503332978023
17/08/21 21:59:42 INFO TransportClientFactory: Successfully created connection to /192.168.1.101:62239 after 18 ms (0 ms spent in bootstraps)
17/08/21 21:59:42 INFO Utils: Fetching spark://192.168.1.101:62239/jars/OrderAnalytics.jar to C:\Users\ram\AppData\Local\Temp\spark-73cbbbe8-9e06-4a11-976
a-a766305d4148\userFiles-3e4c9dea-6273-4d9e-a17b-c807aa0e3da5\fetchFileTemp7196411614488839489.tmp
17/08/21 21:59:43 INFO Executor: Adding file:/C:/Users/ram/AppData/Local/Temp/spark-73cbbbe8-9e06-4a11-976a-a766305d4148/userFiles-3e4c9dea-6273-4d9e-a17b
-c807aa0e3da5/OrderAnalytics.jar to class loader
17/08/21 21:59:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/08/21 21:59:44 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, NODE_LOCAL, 15334 bytes)
17/08/21 21:59:44 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/08/21 21:59:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError: com/twitter/jsr166e/Long
Adder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/08/21 21:59:44 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
17/08/21 21:59:44 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/08/21 21:59:44 INFO TaskSchedulerImpl: Cancelling stage 0
17/08/21 21:59:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/08/21 21:59:44 INFO TaskSchedulerImpl: Stage 0 was cancelled
17/08/21 21:59:44 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on localhost, executor driver: java.lang.NoClassDefFoundError (com/twitter/jsr166e/Lo
ngAdder) [duplicate 1]
17/08/21 21:59:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/08/21 21:59:44 INFO DAGScheduler: ResultStage 0 (runJob at RDDFunctions.scala:36) failed in 1.894 s due to Job aborted due to stage failure: Task 0 in stage
0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError: com/twitter/jsr166e/L
ongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
Driver stacktrace:
17/08/21 21:59:44 INFO DAGScheduler: Job 0 failed: runJob at RDDFunctions.scala:36, took 2.152376 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost tas
k 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2075)
at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:36)
at example.DataMigration$.main(DataMigration.scala:20)
at example.DataMigration.main(DataMigration.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/08/21 21:59:51 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
17/08/21 21:59:52 INFO SerialShutdownHooks: Successfully executed shutdown hook: Clearing session cache for C* connector
17/08/21 21:59:52 INFO SparkContext: Invoking stop() from shutdown hook
17/08/21 21:59:52 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
17/08/21 21:59:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/21 21:59:52 INFO MemoryStore: MemoryStore cleared
17/08/21 21:59:52 INFO BlockManager: BlockManager stopped
17/08/21 21:59:52 INFO BlockManagerMaster: BlockManagerMaster stopped
17/08/21 21:59:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/08/21 21:59:52 INFO SparkContext: Successfully stopped SparkContext
17/08/21 21:59:52 INFO ShutdownHookManager: Shutdown hook called
17/08/21 21:59:52 INFO ShutdownHookManager: Deleting directory C:\Users\ram\AppData\Local\Temp\spark-73cbbbe8-9e06-4a11-976a-a766305d4148
can any one please tell me why spark not able to pick class path of jar or how to resolve this problem ?
Thanks
Indrajit is correct, you need to include the package. I had similar issues when I left my files in the default package. Make your folder structure the same as this http://www.scala-sbt.org/0.13/docs/Directories.html
Add a new folder YOUR_PACKAGE in src/main/scala or src/main/java and put DataMigration in YOUR_PACKAGE. Make sure the first line of DataMigration is:
package YOUR_PACKAGE
Your spark-submit will then be:
spark-submit --jars OrderAnalytics.jar \
--class YOUR_PACKAGE.DataMigration