I can't debug my program in intellij idea CE - scala

Disconnected from the target VM, address: '127.0.0.1:39989', transport: 'socket' on intellij idea CE. I can't debug my program. Any suggestions?
Connected to the target VM, address: '127.0.0.1:39989', transport: 'socket'
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/12/29 17:29:47 INFO SparkContext: Running Spark version 2.1.2
17/12/29 17:29:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/29 17:29:49 WARN Utils: Your hostname, ashfaq-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
17/12/29 17:29:49 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/12/29 17:29:49 INFO SecurityManager: Changing view acls to: ashfaq
17/12/29 17:29:49 INFO SecurityManager: Changing modify acls to: ashfaq
17/12/29 17:29:49 INFO SecurityManager: Changing view acls groups to:
17/12/29 17:29:49 INFO SecurityManager: Changing modify acls groups to:
17/12/29 17:29:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ashfaq); groups with view permissions: Set(); users with modify permissions: Set(ashfaq); groups with modify permissions: Set()
17/12/29 17:29:51 INFO Utils: Successfully started service 'sparkDriver' on port 46133.
17/12/29 17:29:51 INFO SparkEnv: Registering MapOutputTracker
17/12/29 17:29:51 INFO SparkEnv: Registering BlockManagerMaster
17/12/29 17:29:51 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/12/29 17:29:51 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/12/29 17:29:51 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b3b48105-28be-4781-a395-c7e83cc72e8c
17/12/29 17:29:51 INFO MemoryStore: MemoryStore started with capacity 393.1 MB
17/12/29 17:29:51 INFO SparkEnv: Registering OutputCommitCoordinator
17/12/29 17:29:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/12/29 17:29:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
17/12/29 17:29:53 INFO Executor: Starting executor ID driver on host localhost
17/12/29 17:29:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33583.
17/12/29 17:29:54 INFO NettyBlockTransferService: Server created on 10.0.2.15:33583
17/12/29 17:29:54 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/12/29 17:29:54 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:54 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:33583 with 393.1 MB RAM, BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:54 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:54 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 33583, None)
17/12/29 17:29:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.5 KB, free 392.8 MB)
17/12/29 17:29:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 392.8 MB)
17/12/29 17:29:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:33583 (size: 22.9 KB, free: 393.1 MB)
17/12/29 17:29:59 INFO SparkContext: Created broadcast 0 from textFile at scalaApp.scala:13
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/ashfaq/Desktop/saclaAPP/data/UserPurchaseHistory.csv
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1968)
at org.apache.spark.rdd.RDD.count(RDD.scala:1158)
at ScalaApp$.main(scalaApp.scala:18)
at ScalaApp.main(scalaApp.scala)
17/12/29 17:29:59 INFO SparkContext: Invoking stop() from shutdown hook
17/12/29 17:29:59 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
17/12/29 17:29:59 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.0.2.15:33583 in memory (size: 22.9 KB, free: 393.1 MB)
17/12/29 17:29:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/12/29 17:30:00 INFO MemoryStore: MemoryStore cleared
17/12/29 17:30:00 INFO BlockManager: BlockManager stopped
17/12/29 17:30:00 INFO BlockManagerMaster: BlockManagerMaster stopped
17/12/29 17:30:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/12/29 17:30:00 INFO SparkContext: Successfully stopped SparkContext
17/12/29 17:30:00 INFO ShutdownHookManager: Shutdown hook called
Disconnected from the target VM, address: '127.0.0.1:39989', transport: 'socket'
17/12/29 17:30:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-58667739-7c15-4665-8ede-fde9c3ff1d83
Process finished with exit code 1

It looks like, you are trying to open a file which doesn't exist. The fisrt line of the error message says so:
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/ashfaq/Desktop/saclaAPP/data/UserPurchaseHistory.csv

Related

Failing to execute spark-submit command on a sample word count project

I am doing a tutorial on Pluralsight for Apache Spark which is a simple word counter. I am on Windows 11 and I have IntelliJ IDEA 2022.3.1 (Ultimate Edition). Additionally, on my machine I have JKD8, Apache Spark 3.3.1 pre built for Hadoop 3.3 and later, and Hadoop 3.3.4. The code is written in Scala with SBT as the build tooland I've included the code below. After packaging the file with sbt package I run the command
spark-submit --class "main.WordCount" --master "local[*]" "C:\Users\user\Documents\Projects\WordCount\target\scala-2.11\word-count_2.11-0.1.jar"
I am receiving an exception
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat; (Full log below)
I have my dev tools (Java, Spark, Hadoop, etc) under C:\DevTools\TOOL and the Windows Environment variables are set as follows:
JAVA_HOME -> C:\DevTools\TOOL\Java
SPARK_HOME -> C:\DevTools\TOOL\Spark
HADOOP_HOME -> C:\DevTools\TOOL\Hadoop
PATH -> %JAVA_HOME%\bin; %SPARK_HOME%\bin; %HADOOP_HOME%\bin
Lastly, I've downloaded various winutils.exe and and hadoop.dll and I've put them in the Spark bin folder and the Hadoop bin folder but nothing seemingly works. Does anyone have any suggestions as to how I can get this to execute successfully?
build.sbt
name := "Word Count"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion
)
WordCount.scala
package main
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main (args: Array[String]): Unit = {
val configuration = new SparkConf().setAppName("Word Counter")
val sparkContext = new SparkContext(configuration)
val textFile = sparkContext.textFile("file:///DevTools/TOOL/Spark")
val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
val countPrep = tokenizedFileData.map(word=>(word, 1))
val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
val storedCounts = counts.sortBy(kvPair=>kvPair._2, false)
storedCounts.saveAsTextFile("file:///DevTools/TOOL/Spark/output")
}
}
Full Log
PS C:\Users\user\Documents\Projects\WordCount> spark-submit --class "main.WordCount" --master "local[*]" "C:\Users\user\Documents\Projects\WordCount\target\scala-2.11\word-count_2.11-0.1.jar"
23/01/26 17:00:08 INFO SparkContext: Running Spark version 3.3.1
23/01/26 17:00:08 INFO ResourceUtils: ==============================================================
23/01/26 17:00:08 INFO ResourceUtils: No custom resources configured for spark.driver.
23/01/26 17:00:08 INFO ResourceUtils: ==============================================================
23/01/26 17:00:08 INFO SparkContext: Submitted application: Word Counter
23/01/26 17:00:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/01/26 17:00:08 INFO ResourceProfile: Limiting resource is cpu
23/01/26 17:00:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/01/26 17:00:08 INFO SecurityManager: Changing view acls to: user
23/01/26 17:00:08 INFO SecurityManager: Changing modify acls to: user
23/01/26 17:00:08 INFO SecurityManager: Changing view acls groups to:
23/01/26 17:00:08 INFO SecurityManager: Changing modify acls groups to:
23/01/26 17:00:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set()
23/01/26 17:00:09 INFO Utils: Successfully started service 'sparkDriver' on port 50249.
23/01/26 17:00:09 INFO SparkEnv: Registering MapOutputTracker
23/01/26 17:00:09 INFO SparkEnv: Registering BlockManagerMaster
23/01/26 17:00:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/01/26 17:00:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/01/26 17:00:09 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/01/26 17:00:09 INFO DiskBlockManager: Created local directory at C:\Users\user\AppData\Local\Temp\blockmgr-c7d05098-5b05-4121-b1b6-2e7445fc9240
23/01/26 17:00:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
23/01/26 17:00:09 INFO SparkEnv: Registering OutputCommitCoordinator
23/01/26 17:00:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/01/26 17:00:10 INFO SparkContext: Added JAR file:/C:/Users/user/Documents/Projects/WordCount/target/scala-2.11/word-count_2.11-0.1.jar at spark://localhost:50249/jars/word-count_2.11-0.1.jar with timestamp 1674770408345
23/01/26 17:00:10 INFO Executor: Starting executor ID driver on host localhost
23/01/26 17:00:10 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/01/26 17:00:10 INFO Executor: Fetching spark://localhost:50249/jars/word-count_2.11-0.1.jar with timestamp 1674770408345
23/01/26 17:00:10 INFO TransportClientFactory: Successfully created connection to localhost/192.168.1.221:50249 after 58 ms (0 ms spent in bootstraps)
23/01/26 17:00:10 INFO Utils: Fetching spark://localhost:50249/jars/word-count_2.11-0.1.jar to C:\Users\user\AppData\Local\Temp\spark-d7979eef-eac8-4a89-8ee0-246a821703d6\userFiles-8222f8d5-3999-47a7-b048-a9c37e66150a\fetchFileTemp8156211875497724521.tmp
23/01/26 17:00:11 INFO Executor: Adding file:/C:/Users/user/AppData/Local/Temp/spark-d7979eef-eac8-4a89-8ee0-246a821703d6/userFiles-8222f8d5-3999-47a7-b048-a9c37e66150a/word-count_2.11-0.1.jar to class loader
23/01/26 17:00:11 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50306.
23/01/26 17:00:11 INFO NettyBlockTransferService: Server created on localhost:50306
23/01/26 17:00:11 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/01/26 17:00:11 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50306 with 366.3 MiB RAM, BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:11 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 50306, None)
23/01/26 17:00:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 358.0 KiB, free 366.0 MiB)
23/01/26 17:00:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.3 KiB, free 365.9 MiB)
23/01/26 17:00:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:50306 (size: 32.3 KiB, free: 366.3 MiB)
23/01/26 17:00:12 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:13
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:608)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:934)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:848)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:816)
at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2199)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2179)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions.$anonfun$reduceByKey$4(PairRDDFunctions.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:323)
at main.WordCount$.main(WordCount.scala:16)
at main.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/01/26 17:00:12 INFO SparkContext: Invoking stop() from shutdown hook
23/01/26 17:00:12 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/01/26 17:00:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/01/26 17:00:12 INFO MemoryStore: MemoryStore cleared
23/01/26 17:00:12 INFO BlockManager: BlockManager stopped
23/01/26 17:00:12 INFO BlockManagerMaster: BlockManagerMaster stopped
23/01/26 17:00:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/01/26 17:00:12 INFO SparkContext: Successfully stopped SparkContext
23/01/26 17:00:12 INFO ShutdownHookManager: Shutdown hook called
23/01/26 17:00:12 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-d7979eef-eac8-4a89-8ee0-246a821703d6
23/01/26 17:00:12 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-26625e11-a7f1-41f7-b2b3-29f97ea9e75a

Spark MongoDB connector V10 issue

I am having a problem connecting to mongodb using spark structure streaming,
Here is my python code,
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
# from lib.logger import Log4j
if __name__ == "__main__":
spark = (SparkSession
.builder
.appName("Streaming from mongo db")
.master("local[3]")
.config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_10.0.5:3.3.1')
.config("spark.streaming.stopGracefullyOnShutdown", "true")
# .config("spark.sql.shuffle.partitions", 3)
.getOrCreate())
read_from_mongo = (spark
.readStream
.format("mongodb")
.option("uri", "mongodb://admin:admin#localhost:27017")
.option("database", "first_db")
.option("collection", "first_collection")
.load()
.writeStream
.format("console")
.trigger(continuous="1 second")
.outputMode("append"))
y = read_from_mongo.start()
I am running the script using spark-submit file_name.py
the output I'm getting is the following :
22/11/16 16:04:21 INFO SparkContext: Running Spark version 3.3.1
22/11/16 16:04:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/16 16:04:21 INFO ResourceUtils: ==============================================================
22/11/16 16:04:21 INFO ResourceUtils: No custom resources configured for spark.driver.
22/11/16 16:04:21 INFO ResourceUtils: ==============================================================
22/11/16 16:04:21 INFO SparkContext: Submitted application: Streaming from mongo db
22/11/16 16:04:21 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/11/16 16:04:21 INFO ResourceProfile: Limiting resource is cpu
22/11/16 16:04:21 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/11/16 16:04:21 INFO SecurityManager: Changing view acls to: mustaphaaminedebbih
22/11/16 16:04:21 INFO SecurityManager: Changing modify acls to: mustaphaaminedebbih
22/11/16 16:04:21 INFO SecurityManager: Changing view acls groups to:
22/11/16 16:04:21 INFO SecurityManager: Changing modify acls groups to:
22/11/16 16:04:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mustaphaaminedebbih); groups with view permissions: Set(); users with modify permissions: Set(mustaphaaminedebbih); groups with modify permissions: Set()
22/11/16 16:04:21 INFO Utils: Successfully started service 'sparkDriver' on port 52063.
22/11/16 16:04:21 INFO SparkEnv: Registering MapOutputTracker
22/11/16 16:04:21 INFO SparkEnv: Registering BlockManagerMaster
22/11/16 16:04:21 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/11/16 16:04:21 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/11/16 16:04:21 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/11/16 16:04:21 INFO DiskBlockManager: Created local directory at /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/blockmgr-53c25f51-35a7-44ab-bc1d-bb16a7ae050c
22/11/16 16:04:22 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
22/11/16 16:04:22 INFO SparkEnv: Registering OutputCommitCoordinator
22/11/16 16:04:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/11/16 16:04:22 INFO Executor: Starting executor ID driver on host 192.168.9.44
22/11/16 16:04:22 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
22/11/16 16:04:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52065.
22/11/16 16:04:22 INFO NettyBlockTransferService: Server created on 192.168.9.44:52065
22/11/16 16:04:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/11/16 16:04:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.9.44:52065 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.9.44, 52065, None)
22/11/16 16:04:22 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/11/16 16:04:22 INFO SharedState: Warehouse path is 'file:/Users/mustaphaaminedebbih/Desktop/Scrambling/Spark%20Streaming/Streaming%20From%20Mongodb/spark-warehouse'.
Traceback (most recent call last):
File "/Users/mustaphaaminedebbih/Desktop/Scrambling/Spark Streaming/Streaming From Mongodb/test.py", line 19, in
read_from_mongo = (spark
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/streaming.py", line 469, in load
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in call
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
File "/Users/mustaphaaminedebbih/spark3/spark-3.3.1-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
java.lang.ClassNotFoundException:
Failed to find data source: mongodb. Please find packages at
https://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:587)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
at org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:157)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:144)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: mongodb.DefaultSource
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:435)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:661)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:661)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:661)
14 more
22/11/16 16:04:23 INFO SparkContext: Invoking stop() from shutdown hook
22/11/16 16:04:23 INFO SparkUI: Stopped Spark web UI at http://192.168.9.44:4040
22/11/16 16:04:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/11/16 16:04:23 INFO MemoryStore: MemoryStore cleared
22/11/16 16:04:23 INFO BlockManager: BlockManager stopped
22/11/16 16:04:23 INFO BlockManagerMaster: BlockManagerMaster stopped
22/11/16 16:04:23 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/11/16 16:04:23 INFO SparkContext: Successfully stopped SparkContext
22/11/16 16:04:23 INFO ShutdownHookManager: Shutdown hook called
22/11/16 16:04:23 INFO ShutdownHookManager: Deleting directory /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/spark-2ac1c680-c1d8-46e1-bfae-40c82cee015f
22/11/16 16:04:23 INFO ShutdownHookManager: Deleting directory /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/spark-1816dfc9-2a95-4361-88d1-025c999f1514
22/11/16 16:04:23 INFO ShutdownHookManager: Deleting directory /private/var/folders/yt/jz2t42md7qx68kjlydk19x5w0000gn/T/spark-1816dfc9-2a95-4361-88d1-025c999f1514/pyspark-095a2954-0010-4da3-b658-2483dcc79afc
I tried almost every possible solution, but nothing worked

Exception in thread "main" java.lang.NullPointerException com.databricks.dbutils_v1.DBUtilsHolder$$anon$1.invoke

I would like to read a parquet file in Azure Blob, so I have mount the data from Azure Blob to local with dbultils.fs.mount
But I got the errors Exception in thread "main" java.lang.NullPointerException
Below is my log:
hello big data
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/10 23:20:10 INFO SparkContext: Running Spark version 2.1.0
20/06/10 23:20:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/10 23:20:11 INFO SecurityManager: Changing view acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing view acls groups to:
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls groups to:
20/06/10 23:20:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Admin); groups with view permissions: Set(); users with modify permissions: Set(Admin); groups with modify permissions: Set()
20/06/10 23:20:12 INFO Utils: Successfully started service 'sparkDriver' on port 4725.
20/06/10 23:20:12 INFO SparkEnv: Registering MapOutputTracker
20/06/10 23:20:13 INFO SparkEnv: Registering BlockManagerMaster
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/10 23:20:13 INFO DiskBlockManager: Created local directory at C:\Users\Admin\AppData\Local\Temp\blockmgr-c023c3b8-fd70-461a-ac69-24ce9c770efe
20/06/10 23:20:13 INFO MemoryStore: MemoryStore started with capacity 894.3 MB
20/06/10 23:20:13 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/10 23:20:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/10 23:20:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
20/06/10 23:20:13 INFO Executor: Starting executor ID driver on host localhost
20/06/10 23:20:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 4738.
20/06/10 23:20:13 INFO NettyBlockTransferService: Server created on 192.168.0.102:4738
20/06/10 23:20:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/10 23:20:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.102:4738 with 894.3 MB RAM, BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:14 INFO SharedState: Warehouse path is 'file:/E:/sparkdemo/sparkdemo/spark-warehouse/'.
Exception in thread "main" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.databricks.dbutils_v1.DBUtilsHolder$$anon$1.invoke(DBUtilsHolder.scala:17)
at com.sun.proxy.$Proxy7.fs(Unknown Source)
at Transform$.main(Transform.scala:19)
at Transform.main(Transform.scala)
20/06/10 23:20:14 INFO SparkContext: Invoking stop() from shutdown hook
20/06/10 23:20:14 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
20/06/10 23:20:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/10 23:20:14 INFO MemoryStore: MemoryStore cleared
20/06/10 23:20:14 INFO BlockManager: BlockManager stopped
20/06/10 23:20:14 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/10 23:20:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/10 23:20:14 INFO SparkContext: Successfully stopped SparkContext
20/06/10 23:20:14 INFO ShutdownHookManager: Shutdown hook called
20/06/10 23:20:14 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-cbdbcfe7-bc70-4d34-ad8e-5baed8308ae2
My code:
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
import org.apache.spark.sql.SparkSession
object Demo {
def main(args:Array[String]): Unit = {
println("hello big data")
val containerName = "container1"
val storageAccountName = "storageaccount1"
val sas = "saskey"
val url = "wasbs://" + containerName + "#" + storageAccountName + ".blob.core.windows.net/"
var config = "fs.azure.sas." + containerName + "." + storageAccountName + ".blob.core.windows.net"
//Spark session
val spark : SparkSession = SparkSession.builder
.appName("SpartDemo")
.master("local[1]")
.getOrCreate()
//Mount data
dbutils.fs.mount(
source = url,
mountPoint = "/mnt/container1",
extraConfigs = Map(config -> sas))
val parquetFileDF = spark.read.parquet("/mnt/container1/test1.parquet")
parquetFileDF.show()
}
}
My sbt file:
name := "sparkdemo1"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"com.databricks" % "dbutils-api_2.11" % "0.0.3",
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0"
)
Are you running this into a Databricks instance?
If not, that's the problem: dbutils are provided by Databricks execution context.
In that case, as far as I know, you have three options:
Package your application into a jar file and run it using a Databricks job
Use databricks-connect
Try to emulate a mocked dbutils instance outside Databricks as shown here:
com.databricks.dbutils_v1.DBUtilsHolder.dbutils0.set(
new com.databricks.dbutils_v1.DBUtilsV1{
...
}
)
Anyway, I'd say that options 1 and 2 are better than the third one. Also by choosing one of those you don't need to include "dbutils-api_2.11" dependency, as it is provided by Databricks cluster.

Spark streaming with Kafka connector stopping

I am beginning using Spark streaming. I want to get a stream from Kafka with a sample code I found on the Spark documentation : https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html
Here is my code :
object SparkStreaming {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Test_kafka_spark").setMaster("local[*]") // local parallelism 1
val ssc = new StreamingContext(conf, Seconds(1))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9093",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "test",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("spark")
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.map(record => (record.key, record.value))
}
}
All seemed to start well at but the job stopped immediately, logs as follow :
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/04/19 14:37:37 INFO SparkContext: Running Spark version 2.1.0
17/04/19 14:37:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/19 14:37:37 WARN Utils: Your hostname, thibaut-Precision-M4600 resolves to a loopback address: 127.0.1.1; using 10.192.176.101 instead (on interface eno1)
17/04/19 14:37:37 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/04/19 14:37:37 INFO SecurityManager: Changing view acls to: thibaut
17/04/19 14:37:37 INFO SecurityManager: Changing modify acls to: thibaut
17/04/19 14:37:37 INFO SecurityManager: Changing view acls groups to:
17/04/19 14:37:37 INFO SecurityManager: Changing modify acls groups to:
17/04/19 14:37:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(thibaut); groups with view permissions: Set(); users with modify permissions: Set(thibaut); groups with modify permissions: Set()
17/04/19 14:37:37 INFO Utils: Successfully started service 'sparkDriver' on port 41046.
17/04/19 14:37:37 INFO SparkEnv: Registering MapOutputTracker
17/04/19 14:37:37 INFO SparkEnv: Registering BlockManagerMaster
17/04/19 14:37:37 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/04/19 14:37:37 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/04/19 14:37:37 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-266e2f13-0eb2-40a8-9d2f-d50797099a29
17/04/19 14:37:37 INFO MemoryStore: MemoryStore started with capacity 879.3 MB
17/04/19 14:37:37 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/19 14:37:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/04/19 14:37:38 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.192.176.101:4040
17/04/19 14:37:38 INFO Executor: Starting executor ID driver on host localhost
17/04/19 14:37:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39207.
17/04/19 14:37:38 INFO NettyBlockTransferService: Server created on 10.192.176.101:39207
17/04/19 14:37:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/04/19 14:37:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.192.176.101, 39207, None)
17/04/19 14:37:38 INFO BlockManagerMasterEndpoint: Registering block manager 10.192.176.101:39207 with 879.3 MB RAM, BlockManagerId(driver, 10.192.176.101, 39207, None)
17/04/19 14:37:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.192.176.101, 39207, None)
17/04/19 14:37:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.192.176.101, 39207, None)
17/04/19 14:37:38 WARN KafkaUtils: overriding enable.auto.commit to false for executor
17/04/19 14:37:38 WARN KafkaUtils: overriding auto.offset.reset to none for executor
17/04/19 14:37:38 WARN KafkaUtils: overriding executor group.id to spark-executor-test
17/04/19 14:37:38 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 see KAFKA-3135
17/04/19 14:37:38 INFO SparkContext: Invoking stop() from shutdown hook
17/04/19 14:37:38 INFO SparkUI: Stopped Spark web UI at http://10.192.176.101:4040
17/04/19 14:37:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/04/19 14:37:38 INFO MemoryStore: MemoryStore cleared
17/04/19 14:37:38 INFO BlockManager: BlockManager stopped
17/04/19 14:37:38 INFO BlockManagerMaster: BlockManagerMaster stopped
17/04/19 14:37:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/04/19 14:37:38 INFO SparkContext: Successfully stopped SparkContext
17/04/19 14:37:38 INFO ShutdownHookManager: Shutdown hook called
17/04/19 14:37:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-f28a1361-58ba-416b-ac8e-11da0044c1f2
Thanks for any help.
It appears you haven't started your StreamingContext.
Try adding these 2 lines at the end
ssc.start
ssc.awaitTermination
You did not call any action on DStream, so nothing gets executed (map is a transformation and is lazy), also you need to start the StreamingContext.
Please look into this complete example.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala

Spark cassandra connector doesn't work in Standalone Spark cluster

I have a maven scala application that submits a spark job to Spark standalone single node cluster. When job is submitted, Spark application tries to access cassandra, which is hosted on Amazon EC2 instance, using spark-cassandra-connector. Connection is established, but results are not returned. After some time connector disconnects. It works fine if I'm running spark in local mode.
I tried to create simple application and my code looks like this:
val sc = SparkContextLoader.getSC
def runSparkJob():Unit={
val table =sc.cassandraTable("prosolo_logs_zj", "logevents")
println(table.collect().mkString("\n"))
}
SparkContext.scala
object SparkContextLoader {
val sparkConf = new SparkConf()
sparkConf.setMaster("spark://127.0.1.1:7077")
sparkConf.set("spark.cores_max","2")
sparkConf.set("spark.executor.memory","2g")
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.setAppName("Test application")
sparkConf.set("spark.cassandra.connection.host", "xxx.xxx.xxx.xxx")
sparkConf.set("spark.cassandra.connection.port", "9042")
sparkConf.set("spark.ui.port","4041")
val oneJar="/samplesparkmaven/target/samplesparkmaven-jar.jar"
sparkConf.setJars(List(oneJar))
#transient val sc = new SparkContext(sparkConf)
}
Console output looks like:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/02/14 23:11:25 INFO SparkContext: Running Spark version 2.1.0
17/02/14 23:11:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/14 23:11:27 WARN Utils: Your hostname, zoran-Latitude-E5420 resolves to a loopback address: 127.0.1.1; using 192.168.2.68 instead (on interface wlp2s0)
17/02/14 23:11:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/02/14 23:11:27 INFO SecurityManager: Changing view acls to: zoran
17/02/14 23:11:27 INFO SecurityManager: Changing modify acls to: zoran
17/02/14 23:11:27 INFO SecurityManager: Changing view acls groups to:
17/02/14 23:11:27 INFO SecurityManager: Changing modify acls groups to:
17/02/14 23:11:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zoran); groups with view permissions: Set(); users with modify permissions: Set(zoran); groups with modify permissions: Set()
17/02/14 23:11:28 INFO Utils: Successfully started service 'sparkDriver' on port 33995.
17/02/14 23:11:28 INFO SparkEnv: Registering MapOutputTracker
17/02/14 23:11:28 INFO SparkEnv: Registering BlockManagerMaster
17/02/14 23:11:28 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/02/14 23:11:28 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/02/14 23:11:28 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7b25a4cc-cb37-4332-a59b-e36fa45511cd
17/02/14 23:11:28 INFO MemoryStore: MemoryStore started with capacity 870.9 MB
17/02/14 23:11:28 INFO SparkEnv: Registering OutputCommitCoordinator
17/02/14 23:11:28 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/02/14 23:11:28 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.2.68:4041
17/02/14 23:11:28 INFO SparkContext: Added JAR /samplesparkmaven/target/samplesparkmaven-jar.jar at spark://192.168.2.68:33995/jars/samplesparkmaven-jar.jar with timestamp 1487142688817
17/02/14 23:11:28 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://127.0.1.1:7077...
17/02/14 23:11:28 INFO TransportClientFactory: Successfully created connection to /127.0.1.1:7077 after 62 ms (0 ms spent in bootstraps)
17/02/14 23:11:29 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170214231129-0016
17/02/14 23:11:29 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36901.
17/02/14 23:11:29 INFO NettyBlockTransferService: Server created on 192.168.2.68:36901
17/02/14 23:11:29 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/02/14 23:11:29 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.2.68:36901 with 870.9 MB RAM, BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/02/14 23:11:29 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
17/02/14 23:11:31 INFO Cluster: New Cassandra host /xxx.xxx.xxx.xxx:9042 added
17/02/14 23:11:31 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
17/02/14 23:11:32 INFO SparkContext: Starting job: collect at SparkConnector.scala:28
17/02/14 23:11:32 INFO DAGScheduler: Got job 0 (collect at SparkConnector.scala:28) with 6 output partitions
17/02/14 23:11:32 INFO DAGScheduler: Final stage: ResultStage 0 (collect at SparkConnector.scala:28)
17/02/14 23:11:32 INFO DAGScheduler: Parents of final stage: List()
17/02/14 23:11:32 INFO DAGScheduler: Missing parents: List()
17/02/14 23:11:32 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18), which has no missing parents
17/02/14 23:11:32 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 8.4 KB, free 870.9 MB)
17/02/14 23:11:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.4 KB, free 870.9 MB)
17/02/14 23:11:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.68:36901 (size: 4.4 KB, free: 870.9 MB)
17/02/14 23:11:32 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/02/14 23:11:32 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18)
17/02/14 23:11:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks
17/02/14 23:11:39 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
I'm using
scala 2.11.6
spark 2.1.0 (both for standalone spark and dependency in application)
spark-cassandra-connector 2.0.0-M3
Cassandra Java driver 3.0.0
Apache Cassandra 3.9
Version compatibility table for cassandra connector doesn't show any problem with it, but I can't figure out anything else that might be the problem.
I've finally solved the problem I had. It turned out to be the problem with path. I was using local path to the jar, but missed to add "." at the beginning, so it was treated as absolute path.
Unfortunately, there was no exception in the application indicating that file doesn't exist on the provided path, and the only exception I had was from the worker which could not find jar file in the Spark cluster.