Why did Worker kill executor? - eclipse

I'm programing spark application in spark standalone cluster. When I run following code, I got below ClassNotFoundException(reference screenshot). So, I follow the worker(192.168.111.202) log.
package main
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object mavenTest {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("stream test").setMaster("spark://192.168.111.201:7077")
val sc = new SparkContext(conf)
val input = sc.textFile("file:///root/test")
val words = input.flatMap { line => line.split(" ") }
val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
counts.saveAsTextFile("file:///root/mapreduce")
}
}
Following logs are worker's log. These logs say worker kill executor, and error occur. Why did Worker kill executor? Could you give any clue?
16/03/24 20:16:48 INFO Worker: Asked to launch executor app-20160324201648-0011/0 for stream test
16/03/24 20:16:48 INFO SecurityManager: Changing view acls to: root
16/03/24 20:16:48 INFO SecurityManager: Changing modify acls to: root
16/03/24 20:16:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/03/24 20:16:48 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_73/jre/bin/java" "-cp" "/opt/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/opt/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/opt/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/etc/hadoop" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=40243" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver#192.168.111.201:40243/user/CoarseGrainedScheduler" "--executor-id" "0" "--hostname" "192.168.111.202" "--cores" "1" "--app-id" "app-20160324201648-0011" "--worker-url" "akka.tcp://sparkWorker#192.168.111.202:53363/user/Worker"
16/03/24 20:16:54 INFO Worker: Asked to kill executor app-20160324201648-0011/0
16/03/24 20:16:54 INFO ExecutorRunner: Runner thread for executor app-20160324201648-0011/0 interrupted
16/03/24 20:16:54 INFO ExecutorRunner: Killing process!
16/03/24 20:16:54 ERROR FileAppender: Error writing stream to file /opt/spark-1.5.2-bin-hadoop2.6/work/app-20160324201648-0011/0/stderr
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
16/03/24 20:16:54 INFO Worker: Executor app-20160324201648-0011/0 finished with state KILLED exitStatus 143
16/03/24 20:16:54 INFO Worker: Cleaning up local directories for application app-20160324201648-0011
16/03/24 20:16:54 INFO ExternalShuffleBlockResolver: Application app-20160324201648-0011 removed, cleanupLocalDirs = true

I found it was problem about memory, but I don't know well why this problem happen. just add following property in yarn-site.xml file. Apache hadoop say this configure decide whether virtual memory limits will be enforced for containers.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

What's your spark version? This is a spark's known bug, and fixed in version 1.6.
More detail u can see [SPARK-9844]

Related

System memory 259522560 must be at least 4.718592E8. Please use a larger heap size

I have this error when I run my Spark scripts with version 1.6 of Spark.
My scripts are working with version 1.5.
Java version: 1.8
scala version: 2.11.7
I tried to change the system env variable JAVA_OPTS=-Xms128m -Xmx512m many times, with different values of Xms and Xmx but it didn't change anything ...
I also tried to modify the memory settings of Intellij
help/change memory settings...
file/settings/scal compiler...
Nothing worked.
I have different users in the computer, and Java is setup at the root of the computer while intellij is setup in the folder of one of the users. Can it have an impact?
Here are the logs of the error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/30 17:06:54 INFO SparkContext: Running Spark version 1.6.0
20/04/30 17:06:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/30 17:06:55 INFO SecurityManager: Changing view acls to:
20/04/30 17:06:55 INFO SecurityManager: Changing modify acls to:
20/04/30 17:06:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(); users with modify permissions: Set()
20/04/30 17:06:56 INFO Utils: Successfully started service 'sparkDriver' on port 57698.
20/04/30 17:06:57 INFO Slf4jLogger: Slf4jLogger started
20/04/30 17:06:57 INFO Remoting: Starting remoting
20/04/30 17:06:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.1.5.175:57711]
20/04/30 17:06:57 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 57711.
20/04/30 17:06:57 INFO SparkEnv: Registering MapOutputTracker
20/04/30 17:06:57 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
at batch.BatchJob$.main(BatchJob.scala:23)
at batch.BatchJob.main(BatchJob.scala)
20/04/30 17:06:57 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
at batch.BatchJob$.main(BatchJob.scala:23)
at batch.BatchJob.main(BatchJob.scala)
And the beginning of the code:
package batch
import java.lang.management.ManagementFactory
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.{SaveMode, SQLContext}
object BatchJob {
def main (args: Array[String]): Unit = {
// get spark configuration
val conf = new SparkConf()
.setAppName("Lambda with Spark")
// Check if running from IDE
if (ManagementFactory.getRuntimeMXBean.getInputArguments.toString.contains("IntelliJ IDEA")) {
System.setProperty("hadoop.home.dir", "C:\\Libraries\\WinUtils") // required for winutils
conf.setMaster("local[*]")
}
// setup spark context
val sc = new SparkContext(conf)
implicit val sqlContext = new SQLContext(sc)
...
Finally could find a solution:
add -Xms2g -Xmx4g in VM options directly in Intellij Scala Console.
That's the only thing that worked for me

Spark streaming from local file to hdfs. textFileStream

I am trying to stream local directory content to HDFS. This local directory will be modified by a script and contents will be added for every 5 seconds. My spark program will stream this local directory contents and save them to HDFS. However, when I start streaming nothing is happening.
I checked the logs but I didn't get a hint.
Let me explain the scenario. A shell script will moves a file with some data for every 5 seconds in the local directory. The duration object of streaming context is also 5 seconds. As the script moves a new file, atomicity is maintained here if I am not wrong. For every five seconds receivers will process the data and create Dstream object. I just searched about streaming local directories and found that the path should be provided as ”file:///my/path”. I didn't tried with this format. But if this is the case then how the spark executors of the nodes will maintain the common state of the local path provided?
import org.apache.spark._
import org.apache.spark.streaming._
val ssc = new StreamingContext(sc, Seconds(5))
val filestream = ssc.textFileStream("/home/karteekkhadoop/ch06input")
import java.sql.Timestamp
case class Order(time: java.sql.Timestamp, orderId:Long, clientId:Long, symbol:String, amount:Int, price:Double, buy:Boolean)
import java.text.SimpleDateFormat
val orders = filestream.flatMap(line => {
val dateFormat = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss")
var s = line.split(",")
try {
assert(s(6) == "B" || s(6) == "S")
List(Order(new Timestamp(dateFormat.parse(s(0)).getTime()), s(1).toLong, s(2).toLong, s(3), s(4).toInt, s(5).toDouble, s(6)=="B"))
}catch{
case e: Throwable => println("Wrong line format("+e+") : " + line)
List()
}
})
val numPerType = orders.map(o => (o.buy, 1L)).reduceByKey((x,y) => x+y)
numPerType.repartition(1).saveAsTextFiles("/user/karteekkhadoop/ch06output/output", "txt")
ssc.awaitTermination()
Paths given are absolute and exists. I am also including the following logs.
[karteekkhadoop#gw03 stream]$ yarn logs -applicationId application_1540458187951_12531
18/11/21 11:12:35 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/11/21 11:12:35 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
Container: container_e42_1540458187951_12531_01_000001 on wn02.itversity.com:45454
LogAggregationType: LOCAL
==================================================================================
LogType:stderr
LogLastModifiedTime:Wed Nov 21 10:52:00 -0500 2018
LogLength:5320
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hdp01/hadoop/yarn/local/filecache/2693/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/11/21 10:51:57 INFO SignalUtils: Registered signal handler for TERM
18/11/21 10:51:57 INFO SignalUtils: Registered signal handler for HUP
18/11/21 10:51:57 INFO SignalUtils: Registered signal handler for INT
18/11/21 10:51:57 INFO SecurityManager: Changing view acls to: yarn,karteekkhadoop
18/11/21 10:51:57 INFO SecurityManager: Changing modify acls to: yarn,karteekkhadoop
18/11/21 10:51:57 INFO SecurityManager: Changing view acls groups to:
18/11/21 10:51:57 INFO SecurityManager: Changing modify acls groups to:
18/11/21 10:51:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, karteekkhadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, karteekkhadoop); groups with modify permissions: Set()
18/11/21 10:51:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/21 10:51:58 INFO ApplicationMaster: Preparing Local resources
18/11/21 10:51:59 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
18/11/21 10:51:59 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1540458187951_12531_000001
18/11/21 10:51:59 INFO ApplicationMaster: Waiting for Spark driver to be reachable.
18/11/21 10:51:59 INFO ApplicationMaster: Driver now available: gw03.itversity.com:38932
18/11/21 10:51:59 INFO TransportClientFactory: Successfully created connection to gw03.itversity.com/172.16.1.113:38932 after 90 ms (0 ms spent in bootstraps)
18/11/21 10:51:59 INFO ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.5.0-292/hadoop/conf<CPS>/usr/hdp/2.6.5.0-292/hadoop/*<CPS>/usr/hdp/2.6.5.0-292/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.6.5.0-292/hadoop/lib/hadoop-lzo-0.6.0.2.6.5.0-292.jar:/etc/hadoop/conf/secure:/usr/hdp/current/ext/hadoop/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> *********(redacted)
SPARK_USER -> *********(redacted)
command:
LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \
{{JAVA_HOME}}/bin/java \
-server \
-Xmx1024m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.history.ui.port=18081' \
'-Dspark.driver.port=38932' \
'-Dspark.port.maxRetries=100' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler#gw03.itversity.com:38932 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1540458187951_12531 \
--user-class-path \
file:$PWD/__app__.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
__spark_libs__ -> resource { scheme: "hdfs" host: "nn01.itversity.com" port: 8020 file: "/hdp/apps/2.6.5.0-292/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 202745446 timestamp: 1533325894570 type: ARCHIVE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "nn01.itversity.com" port: 8020 file: "/user/karteekkhadoop/.sparkStaging/application_1540458187951_12531/__spark_conf__.zip" } size: 248901 timestamp: 1542815515889 type: ARCHIVE visibility: PRIVATE
===============================================================================
18/11/21 10:51:59 INFO RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8030
18/11/21 10:51:59 INFO YarnRMClient: Registering the ApplicationMaster
18/11/21 10:51:59 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
18/11/21 10:52:00 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
End of LogType:stderr.This log file belongs to a running container (container_e42_1540458187951_12531_01_000001) and so may not be complete.
What is wrong with the code. Please help. Thank you.
You cannot use local directory like that. As with any Spark readers, input (and output) storage has to be accessible from each node (driver and executors) and all nodes have to see exactly the same state.
Additionally please remember that for file system sources, changes to files have to be atomic (like file system move), and non-atomic operations (like appending to file) won't work.

Spark Streaming job wont schedule additional work

Spark 2.1.1 built for Hadoop 2.7.3
Scala 2.11.11
Cluster has 3 Linux RHEL 7.3 Azure VM's, running Spark Standalone Deploy Mode (no YARN or Mesos, yet)
I have created a very simple SparkStreaming job using IntelliJ, written in Scala. I'm using Maven and building the job into a fat/uber jar that contains all dependencies.
When I run the job locally it works fine. If I copy the jar to the cluster and run it with a master of local[2] it also works fine. However, if I submit the job to the cluster master it's like it does not want to schedule additional work beyond the first task. The job starts up, grabs however many events are in the Azure Event Hub, processes them successfully, then never does anymore work. It does not matter if I submit the job to the master as just an application or if it's submitted using supervised cluster mode, both do the same thing.
I've looked through all the logs I know of (master, driver (where applicable), and executor) and I am not seeing any errors or warnings that seem actionable. I've altered the log level, shown below, to show ALL/INFO/DEBUG and sifted through those logs without finding anything that seems relevant.
It may be worth noting that I had previously created several jobs that connect to Kafka, instead of the Azure Event Hub, using Java and those jobs run in supervised cluster mode without an issue on this same cluster. This leads me to believe that the cluster configuration isn't an issue, it's either something with my code (below) or the Azure Event Hub.
Any thoughts on where I might check next to isolate this issue? Here is the code for my simple job.
Thanks in advance.
Note: conf.{name} indicates values I'm loading from a config file. I've tested loading and hard-coding them, both with the same result.
package streamingJob
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.eventhubs.EventHubsUtils
import org.joda.time.DateTime
object TestJob {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
sparkConf.setAppName("TestJob")
// Uncomment to run locally
//sparkConf.setMaster("local[2]")
val sparkContext = new SparkContext(sparkConf)
sparkContext.setLogLevel("ERROR")
val streamingContext: StreamingContext = new StreamingContext(sparkContext, Seconds(1))
val readerParams = Map[String, String] (
"eventhubs.policyname" -> conf.policyname,
"eventhubs.policykey" -> conf.policykey,
"eventhubs.namespace" -> conf.namespace,
"eventhubs.name" -> conf.name,
"eventhubs.partition.count" -> conf.partitionCount,
"eventhubs.consumergroup" -> conf.consumergroup
)
val eventData = EventHubsUtils.createDirectStreams(
streamingContext,
conf.namespace,
conf.progressdir,
Map("name" -> readerParams))
eventData.foreachRDD(r => {
r.foreachPartition { p => {
p.foreach(d => {
println(DateTime.now() + ": " + d)
}) // end of EventData
}} // foreachPartition
}) // foreachRDD
streamingContext.start()
streamingContext.awaitTermination()
}
}
Here is a set of logs from when I run this as an application, not cluster/supervised.
/spark/bin/spark-submit --class streamingJob.TestJob --master spark://{ip}:7077 --total-executor-cores 1 /spark/job-files/fatjar.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/06 17:52:04 INFO SparkContext: Running Spark version 2.1.1
17/11/06 17:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/06 17:52:05 INFO SecurityManager: Changing view acls to: root
17/11/06 17:52:05 INFO SecurityManager: Changing modify acls to: root
17/11/06 17:52:05 INFO SecurityManager: Changing view acls groups to:
17/11/06 17:52:05 INFO SecurityManager: Changing modify acls groups to:
17/11/06 17:52:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/11/06 17:52:06 INFO Utils: Successfully started service 'sparkDriver' on port 44384.
17/11/06 17:52:06 INFO SparkEnv: Registering MapOutputTracker
17/11/06 17:52:06 INFO SparkEnv: Registering BlockManagerMaster
17/11/06 17:52:06 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/06 17:52:06 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/06 17:52:06 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b5e2c0f3-2500-42c6-b057-cf5d368580ab
17/11/06 17:52:06 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/11/06 17:52:06 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/06 17:52:06 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/06 17:52:06 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://{ip}:4040
17/11/06 17:52:06 INFO SparkContext: Added JAR file:/spark/job-files/fatjar.jar at spark://{ip}:44384/jars/fatjar.jar with timestamp 1509990726989
17/11/06 17:52:07 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://{ip}:7077...
17/11/06 17:52:07 INFO TransportClientFactory: Successfully created connection to /{ip}:7077 after 72 ms (0 ms spent in bootstraps)
17/11/06 17:52:07 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20171106175207-0000
17/11/06 17:52:07 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44624.
17/11/06 17:52:07 INFO NettyBlockTransferService: Server created on {ip}:44624
17/11/06 17:52:07 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/11/06 17:52:07 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171106175207-0000/0 on worker-20171106173151-{ip}-46086 ({ip}:46086) with 1 cores
17/11/06 17:52:07 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO StandaloneSchedulerBackend: Granted executor ID app-20171106175207-0000/0 on hostPort {ip}:46086 with 1 cores, 1024.0 MB RAM
17/11/06 17:52:07 INFO BlockManagerMasterEndpoint: Registering block manager {ip}:44624 with 366.3 MB RAM, BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171106175207-0000/0 is now RUNNING
17/11/06 17:52:08 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

Unable to connect to Spark master

I start my DataStax cassandra instance with Spark:
dse cassandra -k
I then run this program (from within Eclipse):
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object Start {
def main(args: Array[String]): Unit = {
println("***** 1 *****")
val sparkConf = new SparkConf().setAppName("Start").setMaster("spark://127.0.0.1:7077")
println("***** 2 *****")
val sparkContext = new SparkContext(sparkConf)
println("***** 3 *****")
}
}
And I get the following output
***** 1 *****
***** 2 *****
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/29 15:27:50 INFO SparkContext: Running Spark version 1.5.2
15/12/29 15:27:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/29 15:27:51 INFO SecurityManager: Changing view acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: Changing modify acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nayan); users with modify permissions: Set(nayan)
15/12/29 15:27:52 INFO Slf4jLogger: Slf4jLogger started
15/12/29 15:27:52 INFO Remoting: Starting remoting
15/12/29 15:27:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#10.0.1.88:55126]
15/12/29 15:27:53 INFO Utils: Successfully started service 'sparkDriver' on port 55126.
15/12/29 15:27:53 INFO SparkEnv: Registering MapOutputTracker
15/12/29 15:27:53 INFO SparkEnv: Registering BlockManagerMaster
15/12/29 15:27:53 INFO DiskBlockManager: Created local directory at /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/blockmgr-21a96671-c33e-498c-83a4-bb5c57edbbfb
15/12/29 15:27:53 INFO MemoryStore: MemoryStore started with capacity 983.1 MB
15/12/29 15:27:53 INFO HttpFileServer: HTTP File server directory is /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/spark-fce0a058-9264-4f2c-8220-c32d90f11bd8/httpd-2a0efcac-2426-49c5-982a-941cfbb48c88
15/12/29 15:27:53 INFO HttpServer: Starting HTTP Server
15/12/29 15:27:53 INFO Utils: Successfully started service 'HTTP file server' on port 55127.
15/12/29 15:27:53 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/29 15:27:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/12/29 15:27:53 INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
15/12/29 15:27:54 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/12/29 15:27:54 INFO AppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:7077...
15/12/29 15:27:54 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#127.0.0.1:7077] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
15/12/29 15:28:14 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1f22aef0 rejected from java.util.concurrent.ThreadPoolExecutor#176cb4af[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
So something is happening during the creation of the spark context.
When i look in $DSE_HOME/logs/spark, it is empty. Not sure where else to look.
It turns out that the problem was the spark library version AND the Scala version. DataStax was running Spark 1.4.1 and Scala 2.10.5, while my eclipse project was using 1.5.2 & 2.11.7 respectively.
Note that BOTH the Spark library and Scala appear to have to match. I tried other combinations, but it only worked when both matched.
I am getting pretty familiar with this part of your posted error:
WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://...
It can have numerous causes, pretty much all related to misconfigured IPs. First I would do whatever zero323 says, then here's my two cents: I have solved my own problems recently by using IP addresses, not hostnames, and the only config I use in a simple standalone cluster is SPARK_MASTER_IP.
SPARK_MASTER_IP in the $SPARK_HOME/conf/spark-env.sh on your master then should lead the master webui to show the IP address you set:
spark://your.ip.address.numbers:7077
And your SparkConf setup can refer to that.
Having said that, I am not familiar with your specific implementation but I notice in the error two occurrences containing:
/private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/
Have you looked there to see if there's a logs directory? Is that where $DSE_HOME points? Alternatively connect to the driver where it creates it's webui:
INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
and you should see a link to an error log there somewhere.
More on the IP vs. hostname thing, this very old bug is marked as Resolved but I have not figured out what they mean by Resolved, so I just tend toward IP addresses.

Spark cluster can't assign resources from remote scala application

So, I've been trying to get off of the ground running Spark-scala. I've written a simple test program, which just extends the SparkPi example a bit :
def main(args: Array[String]): Unit = {
test()
}
def calcPi(spark: SparkContext, args: Array[String], numSlices: Long): Array[Double] = {
val start = System.nanoTime()
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(numSlices * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
val piVal = 4.0 * count / n
println("Pi is roughly " + piVal)
spark.stop()
val end = System.nanoTime()
return Array(piVal, end - start, (piVal - Math.PI)/Math.PI)
}
def test(): Unit ={
val conf = new SparkConf().setAppName("Pi Test")
conf.setSparkHome("/usr/local/spark")
conf.setMaster("spark://<URL_OF_SPARK_CLUSTER>:7077")
conf.set("spark.executor.memory", "512m")
conf.set("spark.cores.max", "1")
conf.set("spark.blockManager.port", "33291")
conf.set("spark.executor.port", "33292")
conf.set("spark.broadcast.port", "33293")
conf.set("spark.fileserver.port", "33294")
conf.set("spark.driver.port", "33296")
conf.set("spark.replClassServer.port", "33297")
val sc = new SparkContext(conf)
val pi = calcPi(sc, Array(), 1000)
for(item <- pi) {
println(item)
}
}
I then made sure that ports 33291-33300 are open on my machine.
when I run the program, it succssfully hits the spark cluster, and seems to assign cores:
But when the program gets the point where it's actually running the hadoop job, the application logs say:
15/12/07 11:50:21 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at BotDetector.scala:49), which has no missing parents
15/12/07 11:50:21 INFO MemoryStore: ensureFreeSpace(1840) called with curMem=0, maxMem=2061647216
15/12/07 11:50:21 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1840.0 B, free 1966.1 MB)
15/12/07 11:50:21 INFO MemoryStore: ensureFreeSpace(1194) called with curMem=1840, maxMem=2061647216
15/12/07 11:50:21 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1194.0 B, free 1966.1 MB)
15/12/07 11:50:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.5.106:33291 (size: 1194.0 B, free: 1966.1 MB)
15/12/07 11:50:21 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/12/07 11:50:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at BotDetector.scala:49)
15/12/07 11:50:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/12/07 11:50:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:50:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:51:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:51:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:51:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:51:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:52:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:52:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/07 11:52:22 INFO AppClient$ClientActor: Executor updated: app-20151207175020-0003/0 is now EXITED (Command exited with code 1)
15/12/07 11:52:22 INFO SparkDeploySchedulerBackend: Executor app-20151207175020-0003/0 removed: Command exited with code 1
15/12/07 11:52:22 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/12/07 11:52:22 INFO AppClient$ClientActor: Executor added: app-20151207175020-0003/1 on worker-20151207173821-10.240.0.7-33295 (10.240.0.7:33295) with 5 cores
15/12/07 11:52:22 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151207175020-0003/1 on hostPort 10.240.0.7:33295 with 5 cores, 512.0 MB RAM
15/12/07 11:52:22 INFO AppClient$ClientActor: Executor updated: app-20151207175020-0003/1 is now LOADING
15/12/07 11:52:23 INFO AppClient$ClientActor: Executor updated: app-20151207175020-0003/1 is now RUNNING
15/12/07 11:52:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
and when I go onto the remote server and look at the worker logs, they say:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/12/07 17:50:21 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/12/07 17:50:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/07 17:50:21 INFO spark.SecurityManager: Changing view acls to: hduser,jschirmer
15/12/07 17:50:21 INFO spark.SecurityManager: Changing modify acls to: hduser,jschirmer
15/12/07 17:50:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser, jschirmer); users with modify permissions: Set(hduser, jschirmer)
15/12/07 17:50:22 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/12/07 17:50:22 INFO Remoting: Starting remoting
15/12/07 17:50:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#10.240.0.7:33292]
15/12/07 17:50:22 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 33292.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
... 4 more
15/12/07 17:52:22 INFO util.Utils: Shutdown hook called
I've tried setting the driver and executor ports to explicitly open ports, with the same result. It's unclear what the problem is. Does anyone have any advice?
Also, note that if I compile this exact same code to a fat jar, and copy it to the remote server, and run it through spark-submit, then it runs successfully. I do have a yarn configuration defined on my server, and I'm open to running spark-yarn, but my understanding is that this cannot be done from a remote server, since you specify master as yarn-cluster, and there's no place to put the host in the config.
It seems you have firewall problem. First check you enabled all required port in your cluster or not then after there is some random ports in spark so you need fix those ports for your cluster then only you can use spark remotely.