Why would a Spark job fail without throwing run-time exceptions? - scala

I'm going through a problem in which my job fails at a particular stage when it invokes a class.
Here's the line
val stockDataFilteredRDD: RDD[stockPriceInfo] =
lineMapToStockPriceInfoObjectRDD
.map(new stockDataFilter(_).requirementsMet.get)
Here's what I see
15/10/0916:02:28INFOClient:ApplicationreportfromResourceManager:
applicationidentifier:application_1438798768056_0254
appId:254
clientToAMToken:null
appDiagnostics:
appMasterHost:ip-10-0-142-138.ec2.internal
appQueue:root.add_twitter_user
appMasterRpcPort:0
appStartTime:1444421062565
yarnAppState:RUNNING
distributedFinalState:UNDEFINED
appTrackingUrl:http://myIP.ip
15/10/0916:02:29INFOClient:ApplicationreportfromResourceManager:
applicationidentifier:application_1438798768056_0254
appId:254
clientToAMToken:null
appDiagnostics:
appMasterHost:ip-10-0-142-138.ec2.internal
appQueue:root.add_twitter_user
appMasterRpcPort:0
appStartTime:1444421062565
yarnAppState:FINISHED
distributedFinalState:FAILED
appTrackingUrl:http://myIP.ip
appUser:add_twitter_user
The class invoked
class stockDataFilter(val s:stockPriceInfo){
val dateDelim="-"
val timeDelim=":"
val dateAndTime=s.dateTime
val splitDateTime=dateAndTime.split("#")
val dateStamp=splitDateTime(0)
val time=splitDateTime(1)
val splitDate=dateStamp.split(dateDelim)
val splitTime=time.split(timeDelim)
val year=splitDate(0);
val month=splitDate(1);
val day=splitDate(2)
val hour=splitTime(0);
val minute=splitTime(1);
val second=splitTime(2)
val openingBell:LocalTime=newLocalTime(9,30)
val closingBell:LocalTime=newLocalTime(16,0)
val currentTime:LocalTime=newLocalTime(hour.toInt,minute.toInt)
//NonTradingSessions
val newYearsDay:Date=newDate(year.toInt-1900,0,1)
val weekends:List[String]=List("Saturday","Sunday")
val date=new Date(year.toInt-1900,month.toInt-1,day.toInt)
val currentDate=new LocalDate(year.toInt-1990,month.toInt-1,day.toInt)
//NewYorkStockExchangeoperatesfrom9:30a.m.to4:00p.m
def isWithinTradingSession:Boolean={
val isAfterOpen:Boolean=currentTime.isAfter(openingBell)
val isBeforeClose:Boolean=currentTime.isBefore(closingBell)
isAfterOpen&&isBeforeClose
}//return Trueifitiswithingtradingtime
def requirementsMet:Option[stockPriceInfo]=isWithinTradingSessionmatch{
case true=>Some(s)
case false=>None
}
}
I am able to display(store in HDFS) anything before that, but once I add this line, it fails. I've looked at the logs, there are no obvious issues. There was also no compile-time or run-time exception. I've been stuck on this for days out of options. Your help would be appreciated.. Regards
LOGS:
LogType: stderr
LogLength: 6638
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data01/yarn/nm/usercache/add_twitter_user/filecache/62/twitteryahoofinanceanalytics.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/10 11:27:24 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/10/10 11:27:25 INFO spark.SecurityManager: Changing view acls to: yarn,add_twitter_user
15/10/10 11:27:25 INFO spark.SecurityManager: Changing modify acls to: yarn,add_twitter_user
15/10/10 11:27:25 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, add_twitter_user); users with modify permissions: Set(yarn, add_twitter_user)
15/10/10 11:27:25 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/10/10 11:27:25 INFO Remoting: Starting remoting
15/10/10 11:27:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#IP]
15/10/10 11:27:26 INFO Remoting: Remoting now listens on addresses: [akka.tcp://driverPropsFetcher#IP]
15/10/10 11:27:26 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port port.
15/10/10 11:27:26 INFO spark.SecurityManager: Changing view acls to: yarn,add_twitter_user
15/10/10 11:27:26 INFO spark.SecurityManager: Changing modify acls to: yarn,add_twitter_user
15/10/10 11:27:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, add_twitter_user); users with modify permissions: Set(yarn, add_twitter_user)
15/10/10 11:27:26 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/10/10 11:27:26 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/10/10 11:27:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/10/10 11:27:26 INFO Remoting: Starting remoting
15/10/10 11:27:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor#myIP]
15/10/10 11:27:26 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor#myIP]
15/10/10 11:27:26 INFO util.Utils: Successfully started service 'sparkExecutor' on port 54841.
15/10/10 11:27:26 INFO Remoting: Remoting shut down
15/10/10 11:27:26 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/10/10 11:27:26 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver#IP:port/user/CoarseGrainedScheduler
15/10/10 11:27:26 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
15/10/10 11:27:26 INFO spark.SecurityManager: Changing view acls to: yarn,add_twitter_user
15/10/10 11:27:26 INFO spark.SecurityManager: Changing modify acls to: yarn,add_twitter_user
15/10/10 11:27:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, add_twitter_user); users with modify permissions: Set(yarn, add_twitter_user)
15/10/10 11:27:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/10/10 11:27:26 INFO Remoting: Starting remoting
15/10/10 11:27:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor#myIP]
15/10/10 11:27:26 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor#myIP]
15/10/10 11:27:26 INFO util.Utils: Successfully started service 'sparkExecutor' on port 36011.
15/10/10 11:27:26 INFO util.AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver#IP:PORT/user/MapOutputTracker
15/10/10 11:27:26 INFO util.AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver#IP/user/BlockManagerMaster
15/10/10 11:27:26 INFO storage.DiskBlockManager: Created local directory at /data01/yarn/nm/usercache/add_twitter_user/appcache/application_1438798768056_0263/spark-local-20151010112726-839d
15/10/10 11:27:26 INFO storage.DiskBlockManager: Created local directory at /data02/yarn/nm/usercache/add_twitter_user/appcache/application_1438798768056_0263/spark-local-20151010112726-4371
15/10/10 11:27:26 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port port.
15/10/10 11:27:26 INFO network.ConnectionManager: Bound socket to port port with id = ConnectionManagerId(myIP,port)
15/10/10 11:27:26 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
15/10/10 11:27:26 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/10/10 11:27:26 INFO storage.BlockManagerMaster: Registered BlockManager
15/10/10 11:27:26 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#IP/user/HeartbeatReceiver
15/10/10 11:27:27 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
15/10/10 11:27:27 INFO network.ConnectionManager: Selector thread was interrupted!
15/10/10 11:27:27 INFO network.ConnectionManager: ConnectionManager stopped
15/10/10 11:27:27 INFO storage.MemoryStore: MemoryStore cleared
15/10/10 11:27:27 INFO storage.BlockManager: BlockManager stopped
15/10/10 11:27:27 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/10/10 11:27:27 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/10/10 11:27:27 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/10/10 11:27:27 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/10/10 11:27:27 INFO Remoting: Remoting shut down
15/10/10 11:27:27 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/10/10 11:27:27 INFO Remoting: Remoting shut down
15/10/10 11:27:27 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
LogType: stdout
LogLength: 0
Log Contents:

Related

unable to print scala word count

Im trying to make a scala program that counts the number of words within a txt file and print the final count (on cloudera and using Spark)
import scala.io.Codec.string2codec
import scala.io.Source
import scala.reflect.io.File
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleWordCount {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Word Count")
val sc = new SparkContext(conf)
The program recognises the file as the correct location as when I put in a false location it recognises it as an error
scala.io.Source.fromFile("/home/cloudera/Books/book1.txt")
.getLines
.flatMap(_.split("\\W+"))
.foldLeft(Map.empty[String, Int]){
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
I've tried different ways to print the line here but got errors i.e
System.Out.Println(count)
[error] /home/cloudera/src/main/scala/SimpleWordCount.scala:19:21: type mismatch;
[error] found : Unit
[error] required: scala.collection.immutable.Map[String,Int]
System.out.println(word,count)
type mismatch;
[error] found : Unit
[error] required: scala.collection.immutable.Map[String,Int]
[error] System.out.println(word,count)
}
Added the following line to check if the program was running
System.out.println("This is working over here !!!!!!!!!#$%%E^$%^%%$%#$^%")
}
}
When I run the code it produces the following output
cloudera#quickstart ~]$ spark-submit --master=local[*] --class=SimpleWordCount /home/cloudera/target/scala-2.10/wordcount_2.10-1.0.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/avro/avro-tools-1.7.6-cdh5.12.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/12/04 08:13:09 INFO spark.SparkContext: Running Spark version 1.6.0
18/12/04 08:13:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/12/04 08:13:10 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 192.168.182.129 instead (on interface eth3)
18/12/04 08:13:10 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/12/04 08:13:10 INFO spark.SecurityManager: Changing view acls to: cloudera
18/12/04 08:13:10 INFO spark.SecurityManager: Changing modify acls to: cloudera
18/12/04 08:13:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
18/12/04 08:13:10 INFO util.Utils: Successfully started service 'sparkDriver' on port 34679.
18/12/04 08:13:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
18/12/04 08:13:11 INFO Remoting: Starting remoting
18/12/04 08:13:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.182.129:47272]
18/12/04 08:13:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem#192.168.182.129:47272]
18/12/04 08:13:11 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 47272.
18/12/04 08:13:11 INFO spark.SparkEnv: Registering MapOutputTracker
18/12/04 08:13:11 INFO spark.SparkEnv: Registering BlockManagerMaster
18/12/04 08:13:11 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-1f139fce-3b13-4c07-bed6-7b35f82ccc6a
18/12/04 08:13:11 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
18/12/04 08:13:11 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/12/04 08:13:11 INFO server.Server: jetty-8.y.z-SNAPSHOT
18/12/04 08:13:11 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
18/12/04 08:13:11 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/12/04 08:13:11 INFO ui.SparkUI: Started SparkUI at http://192.168.182.129:4040
18/12/04 08:13:11 INFO spark.SparkContext: Added JAR file:/home/cloudera/target/scala-2.10/wordcount_2.10-1.0.jar at spark://192.168.182.129:34679/jars/wordcount_2.10-1.0.jar with timestamp 1543939991769
18/12/04 08:13:11 INFO executor.Executor: Starting executor ID driver on host localhost
18/12/04 08:13:11 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58495.
18/12/04 08:13:11 INFO netty.NettyBlockTransferService: Server created on 58495
18/12/04 08:13:11 INFO storage.BlockManagerMaster: Trying to register BlockManager
18/12/04 08:13:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:58495 with 530.3 MB RAM, BlockManagerId(driver, localhost, 58495)
18/12/04 08:13:11 INFO storage.BlockManagerMaster: Registered BlockManager
The PrintLn command seems to be working here
This is working over here !!!!!!!!!#$%%E^$%^%%$%#$^%
18/12/04 08:13:12 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
18/12/04 08:13:12 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
18/12/04 08:13:12 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.182.129:4040
18/12/04 08:13:12 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/12/04 08:13:12 INFO storage.MemoryStore: MemoryStore cleared
18/12/04 08:13:12 INFO storage.BlockManager: BlockManager stopped
18/12/04 08:13:12 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/12/04 08:13:12 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/12/04 08:13:12 INFO spark.SparkContext: Successfully stopped SparkContext
18/12/04 08:13:12 INFO util.ShutdownHookManager: Shutdown hook called
18/12/04 08:13:12 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/12/04 08:13:12 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e468a57d-10e2-472e-98f2-f3701f6a4b1a
Maybe this can help you?
As I said, the problem is that you can't return a print inside a foldLeft, but you can just add one and then return.
val lines = List(
"Hello, World!",
"Goodbye, World!",
"Hello, Hadoop!"
)
val wordCount =
lines
.flatMap(_.split("\\W+"))
.foldLeft(Map.empty[String, Int]) {
(count, word) =>
println(s"DEBUG count: $count for word: '$word'.")
count + (word -> (count.getOrElse(word, 0) + 1))
}
val formatteWordCount =
wordCount
.map(tuple => s"${tuple._1} -> ${tuple._2}")
.mkString("\n", "\n", "\n")
println(s"Final Word Count: $formatteWordCount")
Output
DEBUG count: Map() for word: 'Hello'.
DEBUG count: Map(Hello -> 1) for word: 'World'.
DEBUG count: Map(Hello -> 1, World -> 1) for word: 'Goodbye'.
DEBUG count: Map(Hello -> 1, World -> 1, Goodbye -> 1) for word: 'World'.
DEBUG count: Map(Hello -> 1, World -> 2, Goodbye -> 1) for word: 'Hello'.
DEBUG count: Map(Hello -> 2, World -> 2, Goodbye -> 1) for word: 'Hadoop'.
Final Word Count:
Hello -> 2
World -> 2
Goodbye -> 1
Hadoop -> 1

Hortonworks, Eclipse and Kerberos Client (Authentication, HOW?)

Hello everybody, we have a kerberized HDP (Hortonworks) cluster, we can run Spark jobs from Spark-Submit (CLI), Talend Big Data, but not from Eclipse.
We have a Windows client machine where Eclipse is installed and MIT windows Kerberos Client is confgiured (TGT Configuration). The goal is to run Spark job using eclipse. Portion of the java code related with Spark is operational and tested via CLI. Below is mentioned part of the code for the job.
private void setConfigurationProperties()
{
try{
sConfig.setAppName("abcd-name");
sConfig.setMaster("yarn-client");
sConfig.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sConfig.set("spark.hadoop.yarn.resourcemanager.address", "rs.abcd.com:8032"); sConfig.set("spark.hadoop.yarn.resourcemanager.scheduler.address","rs.abcd.com:8030");
sConfig.set("spark.hadoop.mapreduce.jobhistory.address","rs.abcd.com:10020");
sConfig.set("spark.hadoop.yarn.app.mapreduce.am.staging-dir", "/dir");
sConfig.set("spark.executor.memory", "2g");
sConfig.set("spark.executor.cores", "4");
sConfig.set("spark.executor.instances", "24");
sConfig.set("spark.yarn.am.cores", "24");
sConfig.set("spark.yarn.am.memory", "16g");
sConfig.set("spark.eventLog.enabled", "true");
sConfig.set("spark.eventLog.dir", "hdfs:///spark-history");
sConfig.set("spark.shuffle.memoryFraction", "0.4");
sConfig.set("spark.hadoop." + "mapreduce.application.framework.path","/hdp/apps/version/mapreduce/mapreduce.tar.gz#mr-framework");
sConfig.set("spark.local.dir", "/tmp");
sConfig.set("spark.hadoop.yarn.resourcemanager.principal", "rm/_HOST#ABCD.COM");
sConfig.set("spark.hadoop.mapreduce.jobhistory.principal", "jhs/_HOST#ABCD.COM");
sConfig.set("spark.hadoop.dfs.namenode.kerberos.principal", "nn/_HOST#ABCD.COM");
sConfig.set("spark.hadoop.fs.defaultFS", "hdfs://hdfs.abcd.com:8020");
sConfig.set("spark.hadoop.dfs.client.use.datanode.hostname", "true"); }
}
When we run the code the following error pops up:
17/04/05 23:37:06 INFO Remoting: Starting remoting
17/04/05 23:37:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#1.1.1.1:54356]
17/04/05 23:37:06 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54356.
17/04/05 23:37:06 INFO SparkEnv: Registering MapOutputTracker
17/04/05 23:37:06 INFO SparkEnv: Registering BlockManagerMaster
17/04/05 23:37:06 INFO DiskBlockManager: Created local directory at C:\tmp\blockmgr-baee2441-1977-4410-b52f-4275ff35d6c1
17/04/05 23:37:06 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
17/04/05 23:37:06 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/05 23:37:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/04/05 23:37:07 INFO SparkUI: Started SparkUI at http://1.1.1.1:4040
17/04/05 23:37:07 INFO RMProxy: Connecting to ResourceManager at rs.abcd.com/1.1.1.1:8032
17/04/05 23:37:07 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
17/04/05 23:37:07 INFO SparkUI: Stopped Spark web UI at http://1.1.1.1:4040
Please guide us how to specify in java code Kerberos authentication method instead of SIMPLE. Or how to instruct the client for Kerberos authentication request. And whole what should the process look like and what would be the right approach
Thank you

Spark Internal REST API: Unable to find dependent jars

I am trying to submit the Spark program using the SPARK internal REST API.
Request for submitting the program below. The required supporting jars are in place.
curl -X POST http://quickstart.cloudera:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "SampleSparkProgramApp" ],
"appResource" : "file:///home/cloudera/test_sample_example/spark-example.jar",
"clientSparkVersion" : "1.5.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "com.example.SampleSparkProgram",
"sparkProperties" : {
"spark.jars" : "file:///home/cloudera/test_sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar,file:///home/cloudera/test_sample_example/lib/mongo-java-driver-3.0.4.jar,file:///home/cloudera/test_sample_example/lib/lucene-analyzers-common-5.4.0.jar,file:///home/cloudera/test_sample_example/lib/lucene-core-5.2.1.jar",
"spark.driver.supervise" : "false",
"spark.app.name" : "MyJob",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "client",
"spark.master" : "spark://quickstart.cloudera:6066"
}
}'
The class com.mongodb.hadoop.MongoInputFormat is avalible in mongo-hadoop-core-1.0-snapshot.jar and the jar is added in the request with the key "spark.jars".
I am getting below error in spark UI logs.
1.5.0-cdh5.5.0 stderr log page for driver-20160121040910-0026
Back to Master
Bytes 0 - 12640 of 12640
Launch Command: "/usr/java/jdk1.7.0_67-cloudera/jre/bin/java" "-cp" "/usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.0-cdh5.5.0-hadoop2.6.0-cdh5.5.0.jar:/etc/hadoop/conf/:/usr/lib/spark/sbin/../lib/spark-assembly.jar:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/*" "-Xms1024M" "-Xmx1024M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.supervise=false" "-Dspark.app.name=MyJob" "-Dspark.jars=file:///home/cloudera/test_sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar,file:///home/cloudera/test_sample_example/lib/mongo-java-driver-3.0.4.jar,file:///home/cloudera/test_sample_example/lib/lucene-analyzers-common-5.4.0.jar,file:///home/cloudera/test_sample_example/lib/lucene-core-5.2.1.jar" "-Dspark.master=spark://quickstart.cloudera:7077" "-Dspark.submit.deployMode=client" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp://sparkWorker#182.162.106.131:7078/user/Worker" "/var/run/spark/work/driver-20160121040910-0026/spark-example.jar" "com.example.SampleSparkProgram" "SampleSparkProgramApp"
========================================
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/01/21 04:09:16 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 182.162.106.131 instead (on interface eth1)
16/01/21 04:09:16 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/01/21 04:09:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/21 04:09:21 INFO spark.SecurityManager: Changing view acls to: root
16/01/21 04:09:21 INFO spark.SecurityManager: Changing modify acls to: root
16/01/21 04:09:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/21 04:09:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/21 04:09:26 INFO Remoting: Starting remoting
16/01/21 04:09:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://Driver#182.162.106.131:38181]
16/01/21 04:09:27 INFO Remoting: Remoting now listens on addresses: [akka.tcp://Driver#182.162.106.131:38181]
16/01/21 04:09:27 INFO util.Utils: Successfully started service 'Driver' on port 38181.
16/01/21 04:09:27 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker#182.162.106.131:7078/user/Worker
16/01/21 04:09:28 INFO spark.SparkContext: Running Spark version 1.5.0-cdh5.5.0
16/01/21 04:09:28 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker#182.162.106.131:7078/user/Worker
16/01/21 04:09:28 INFO spark.SecurityManager: Changing view acls to: root
16/01/21 04:09:28 INFO spark.SecurityManager: Changing modify acls to: root
16/01/21 04:09:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/21 04:09:29 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/21 04:09:29 INFO Remoting: Starting remoting
16/01/21 04:09:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#182.162.106.131:35467]
16/01/21 04:09:29 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver#182.162.106.131:35467]
16/01/21 04:09:29 INFO util.Utils: Successfully started service 'sparkDriver' on port 35467.
16/01/21 04:09:29 INFO spark.SparkEnv: Registering MapOutputTracker
16/01/21 04:09:30 INFO spark.SparkEnv: Registering BlockManagerMaster
16/01/21 04:09:30 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-6b24e210-4002-4e28-ac60-c2ecc497b914
16/01/21 04:09:30 INFO storage.MemoryStore: MemoryStore started with capacity 534.5 MB
16/01/21 04:09:31 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-bc7bc70e-af91-44cb-a764-8c6d1d9b3acc/httpd-65b7bbf1-af6d-4252-8629-95fcb60f706f
16/01/21 04:09:31 INFO spark.HttpServer: Starting HTTP Server
16/01/21 04:09:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/21 04:09:31 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:38126
16/01/21 04:09:31 INFO util.Utils: Successfully started service 'HTTP file server' on port 38126.
16/01/21 04:09:31 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/01/21 04:09:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/21 04:09:33 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/01/21 04:09:33 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/01/21 04:09:33 INFO ui.SparkUI: Started SparkUI at http://182.162.106.131:4040
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar at hdfs:///user/cloudera/sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar with timestamp 1453378173778
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/mongo-java-driver-3.0.4.jar at hdfs:///user/cloudera/sample_example/lib/mongo-java-driver-3.0.4.jar with timestamp 1453378173782
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/lucene-analyzers-common-5.4.0.jar at hdfs:///user/cloudera/sample_example/lib/lucene-analyzers-common-5.4.0.jar with timestamp 1453378173783
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/lucene-core-5.2.1.jar at hdfs:///user/cloudera/sample_example/lib/lucene-core-5.2.1.jar with timestamp 1453378173783
16/01/21 04:09:34 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/01/21 04:09:34 INFO client.AppClient$ClientEndpoint: Connecting to master spark://quickstart.cloudera:7077...
16/01/21 04:09:35 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160121040935-0025
16/01/21 04:09:36 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38749.
16/01/21 04:09:36 INFO netty.NettyBlockTransferService: Server created on 38749
16/01/21 04:09:36 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/01/21 04:09:36 INFO storage.BlockManagerMasterEndpoint: Registering block manager 182.162.106.131:38749 with 534.5 MB RAM, BlockManagerId(driver, 182.162.106.131, 38749)
16/01/21 04:09:36 INFO storage.BlockManagerMaster: Registered BlockManager
16/01/21 04:09:40 INFO scheduler.EventLoggingListener: Logging events to file:/tmp/spark-events/app-20160121040935-0025
16/01/21 04:09:40 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/01/21 04:09:40 INFO analyser.NaiveByesAnalyserFactory: ENTERING
16/01/21 04:09:40 INFO dao.MongoDataExtractor: ENTERING
16/01/21 04:09:41 INFO dao.MongoDataExtractor: EXITING
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat
at com.examples.dao.MongoDataExtractor.getData(MongoDataExtractor.java:35)
at com.examples.analyser.NaiveByesAnalyserFactory.getNaiveByesAnalyserFactory(NaiveByesAnalyserFactory.java:27)
at com.example.SampleSparkProgram.main(SampleSparkProgram.java:24)
... 6 more
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.MongoInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
16/01/21 04:09:41 INFO spark.SparkContext: Invoking stop() from shutdown hook
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/01/21 04:09:41 INFO ui.SparkUI: Stopped Spark web UI at http://182.162.106.131:4040
16/01/21 04:09:41 INFO scheduler.DAGScheduler: Stopping DAGScheduler
16/01/21 04:09:41 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
16/01/21 04:09:41 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
16/01/21 04:09:41 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/21 04:09:41 INFO storage.MemoryStore: MemoryStore cleared
16/01/21 04:09:41 INFO storage.BlockManager: BlockManager stopped
16/01/21 04:09:41 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/01/21 04:09:41 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/21 04:09:41 INFO spark.SparkContext: Successfully stopped SparkContext
16/01/21 04:09:41 INFO util.ShutdownHookManager: Shutdown hook called
16/01/21 04:09:41 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bc7bc70e-af91-44cb-a764-8c6d1d9b3acc

Unable to connect to Spark master

I start my DataStax cassandra instance with Spark:
dse cassandra -k
I then run this program (from within Eclipse):
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object Start {
def main(args: Array[String]): Unit = {
println("***** 1 *****")
val sparkConf = new SparkConf().setAppName("Start").setMaster("spark://127.0.0.1:7077")
println("***** 2 *****")
val sparkContext = new SparkContext(sparkConf)
println("***** 3 *****")
}
}
And I get the following output
***** 1 *****
***** 2 *****
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/29 15:27:50 INFO SparkContext: Running Spark version 1.5.2
15/12/29 15:27:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/29 15:27:51 INFO SecurityManager: Changing view acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: Changing modify acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nayan); users with modify permissions: Set(nayan)
15/12/29 15:27:52 INFO Slf4jLogger: Slf4jLogger started
15/12/29 15:27:52 INFO Remoting: Starting remoting
15/12/29 15:27:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#10.0.1.88:55126]
15/12/29 15:27:53 INFO Utils: Successfully started service 'sparkDriver' on port 55126.
15/12/29 15:27:53 INFO SparkEnv: Registering MapOutputTracker
15/12/29 15:27:53 INFO SparkEnv: Registering BlockManagerMaster
15/12/29 15:27:53 INFO DiskBlockManager: Created local directory at /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/blockmgr-21a96671-c33e-498c-83a4-bb5c57edbbfb
15/12/29 15:27:53 INFO MemoryStore: MemoryStore started with capacity 983.1 MB
15/12/29 15:27:53 INFO HttpFileServer: HTTP File server directory is /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/spark-fce0a058-9264-4f2c-8220-c32d90f11bd8/httpd-2a0efcac-2426-49c5-982a-941cfbb48c88
15/12/29 15:27:53 INFO HttpServer: Starting HTTP Server
15/12/29 15:27:53 INFO Utils: Successfully started service 'HTTP file server' on port 55127.
15/12/29 15:27:53 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/29 15:27:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/12/29 15:27:53 INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
15/12/29 15:27:54 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/12/29 15:27:54 INFO AppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:7077...
15/12/29 15:27:54 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#127.0.0.1:7077] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
15/12/29 15:28:14 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1f22aef0 rejected from java.util.concurrent.ThreadPoolExecutor#176cb4af[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
So something is happening during the creation of the spark context.
When i look in $DSE_HOME/logs/spark, it is empty. Not sure where else to look.
It turns out that the problem was the spark library version AND the Scala version. DataStax was running Spark 1.4.1 and Scala 2.10.5, while my eclipse project was using 1.5.2 & 2.11.7 respectively.
Note that BOTH the Spark library and Scala appear to have to match. I tried other combinations, but it only worked when both matched.
I am getting pretty familiar with this part of your posted error:
WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://...
It can have numerous causes, pretty much all related to misconfigured IPs. First I would do whatever zero323 says, then here's my two cents: I have solved my own problems recently by using IP addresses, not hostnames, and the only config I use in a simple standalone cluster is SPARK_MASTER_IP.
SPARK_MASTER_IP in the $SPARK_HOME/conf/spark-env.sh on your master then should lead the master webui to show the IP address you set:
spark://your.ip.address.numbers:7077
And your SparkConf setup can refer to that.
Having said that, I am not familiar with your specific implementation but I notice in the error two occurrences containing:
/private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/
Have you looked there to see if there's a logs directory? Is that where $DSE_HOME points? Alternatively connect to the driver where it creates it's webui:
INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
and you should see a link to an error log there somewhere.
More on the IP vs. hostname thing, this very old bug is marked as Resolved but I have not figured out what they mean by Resolved, so I just tend toward IP addresses.

How to set remoteHost in spark RetryingBlockFetcher IOException

I apologize for such an extremely long post, but I wanted to be better understood.
I have built up my cluster, where master in on another machine than workers. Workers are allocated on a quite efficient machine. Between these two machines no firewall is applied.
URL: spark://MASTER_IP:7077
Workers: 10
Cores: 10 Total, 0 Used
Memory: 40.0 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
Before launching the app, in the workers logfile is (an example for one worker)
15/03/06 18:52:19 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/03/06 18:52:19 INFO SecurityManager: Changing view acls to: szymon
15/03/06 18:52:19 INFO SecurityManager: Changing modify acls to: szymon
15/03/06 18:52:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(szymon); users with modify permissions: Set(szymon)
15/03/06 18:52:20 INFO Slf4jLogger: Slf4jLogger started
15/03/06 18:52:20 INFO Remoting: Starting remoting
15/03/06 18:52:20 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker#WORKER_MACHINE_IP:42240]
15/03/06 18:52:20 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkWorker#WORKER_MACHINE_IP:42240]
15/03/06 18:52:20 INFO Utils: Successfully started service 'sparkWorker' on port 42240.
15/03/06 18:52:20 INFO Worker: Starting Spark worker WORKER_MACHINE_IP:42240 with 1 cores, 4.0 GB RAM
15/03/06 18:52:20 INFO Worker: Spark home: /home/szymon/spark
15/03/06 18:52:20 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/03/06 18:52:20 INFO WorkerWebUI: Started WorkerWebUI at http://WORKER_MACHINE_IP:8081
15/03/06 18:52:20 INFO Worker: Connecting to master spark://MASTER_IP:7077...
15/03/06 18:52:20 INFO Worker: Successfully registered with master spark://MASTER_IP:7077
I launch my application on a cluster (on the master machine)
./bin/spark-submit --class SimpleApp --master spark://MASTER_IP:7077 --executor-memory 3g --total-executor-cores 10 code/trial_2.11-0.9.jar
My app is then fetched by workers, this is an example of the log output for a worker (#WORKER_MACHINE)
15/03/06 18:07:45 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_31/bin/java" "-cp" "::/home/machine/spark/conf:/home/machine/spark/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar" "-Dspark.driver.port=56753" "-Dlog4j.configuration=file:////home/machine/spark/conf/log4j.properties" "-Dspark.driver.host=MASTER_IP" "-Xms3072M" "-Xmx3072M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://sparkDriver#MASTER_IP:56753/user/CoarseGrainedScheduler" "4" "WORKER_MACHINE_IP" "1" "app-20150306181450-0000" "akka.tcp://sparkWorker#WORKER_MACHINE_IP:45288/user/Worker"
The app wants to connect to localhost at address 127.0.0.1 instead of MASTER_IP (I believe).
How could it be fixed?
15/03/06 18:58:52 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to localhost/127.0.0.1:56545
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
The problem is caused by createClient method in TransportClientFactory which is in spark-network-common_2.10-1.2.1-sources.jar, String remoteHost is set up as localhost
/**
* Create a {#link TransportClient} connecting to the given remote host / port.
*
* We maintains an array of clients (size determined by spark.shuffle.io.numConnectionsPerPeer)
* and randomly picks one to use. If no client was previously created in the randomly selected
* spot, this function creates a new client and places it there.
*
* Prior to the creation of a new TransportClient, we will execute all
* {#link TransportClientBootstrap}s that are registered with this factory.
*
* This blocks until a connection is successfully established and fully bootstrapped.
*
* Concurrency: This method is safe to call from multiple threads.
*/
public TransportClient createClient(String remoteHost, int remotePort) throws IOException {
// Get connection from the connection pool first.
// If it is not found or not active, create a new one.
final InetSocketAddress address = new InetSocketAddress(remoteHost, remotePort);
.
.
.
clientPool.clients[clientIndex] = createClient(address);
}
Here is the file spark-env.sh on the workers site
export SPARK_HOME=/home/szymon/spark
export SPARK_MASTER_IP=MASTER_IP
export SPARK_MASTER_WEBUI_PORT=8081
export SPARK_LOCAL_IP=WORKER_MACHINE_IP
export SPARK_DRIVER_HOST=WORKER_MACHINE_IP
export SPARK_LOCAL_DIRS=/home/szymon/spark/slaveData
export SPARK_WORKER_INSTANCES=10
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_DIR=/home/szymon/spark/work
And on the master
export SPARK_MASTER_IP=MASTER_IP
export SPARK_LOCAL_IP=MASTER_IP
export SPARK_MASTER_WEBUI_PORT=8081
export SPARK_JAVA_OPTS="-Dlog4j.configuration=file:////home/szymon/spark/conf/log4j.properties -Dspark.driver.host=MASTER_IP"
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=10"
This is the full log output with more details
15/03/06 18:58:50 INFO Worker: Asked to launch executor app-20150306190555-0000/0 for Simple Application
15/03/06 18:58:50 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_31/bin/java" "-cp" "::/home/szymon/spark/conf:/home/szymon/spark/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar" "-Dspark.driver.port=49407" "-Dlog4j.configuration=file:////home/szymon/spark/conf/log4j.properties" "-Dspark.driver.host=MASTER_IP" "-Xms3072M" "-Xmx3072M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://sparkDriver#MASTER_IP:49407/user/CoarseGrainedScheduler" "0" "WORKER_MACHINE_IP" "1" "app-20150306190555-0000" "akka.tcp://sparkWorker#WORKER_MACHINE_IP:42240/user/Worker"
15/03/06 18:58:50 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/03/06 18:58:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/06 18:58:51 INFO SecurityManager: Changing view acls to: szymon
15/03/06 18:58:51 INFO SecurityManager: Changing modify acls to: szymon
15/03/06 18:58:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(szymon); users with modify permissions: Set(szymon)
15/03/06 18:58:51 INFO Slf4jLogger: Slf4jLogger started
15/03/06 18:58:51 INFO Remoting: Starting remoting
15/03/06 18:58:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#WORKER_MACHINE_IP:52038]
15/03/06 18:58:51 INFO Utils: Successfully started service 'driverPropsFetcher' on port 52038.
15/03/06 18:58:52 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/03/06 18:58:52 INFO SecurityManager: Changing view acls to: szymon
15/03/06 18:58:52 INFO SecurityManager: Changing modify acls to: szymon
15/03/06 18:58:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(szymon); users with modify permissions: Set(szymon)
15/03/06 18:58:52 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/03/06 18:58:52 INFO Slf4jLogger: Slf4jLogger started
15/03/06 18:58:52 INFO Remoting: Starting remoting
15/03/06 18:58:52 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/03/06 18:58:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor#WORKER_MACHINE_IP:37114]
15/03/06 18:58:52 INFO Utils: Successfully started service 'sparkExecutor' on port 37114.
15/03/06 18:58:52 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver#MASTER_IP:49407/user/CoarseGrainedScheduler
15/03/06 18:58:52 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker#WORKER_MACHINE_IP:42240/user/Worker
15/03/06 18:58:52 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker#WORKER_MACHINE_IP:42240/user/Worker
15/03/06 18:58:52 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
15/03/06 18:58:52 INFO Executor: Starting executor ID 0 on host WORKER_MACHINE_IP
15/03/06 18:58:52 INFO SecurityManager: Changing view acls to: szymon
15/03/06 18:58:52 INFO SecurityManager: Changing modify acls to: szymon
15/03/06 18:58:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(szymon); users with modify permissions: Set(szymon)
15/03/06 18:58:52 INFO AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver#MASTER_IP:49407/user/MapOutputTracker
15/03/06 18:58:52 INFO AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver#MASTER_IP:49407/user/BlockManagerMaster
15/03/06 18:58:52 INFO DiskBlockManager: Created local directory at /home/szymon/spark/slaveData/spark-b09c3727-8559-4ab8-ab32-1f5ecf7aeaf2/spark-0c892a4d-c8b9-4144-a259-8077f5316b52/spark-89577a43-fb43-4a12-a305-34b267b01f8a/spark-7ad207c4-9d37-42eb-95e4-7b909b71c687
15/03/06 18:58:52 INFO MemoryStore: MemoryStore started with capacity 1589.8 MB
15/03/06 18:58:52 INFO NettyBlockTransferService: Server created on 51205
15/03/06 18:58:52 INFO BlockManagerMaster: Trying to register BlockManager
15/03/06 18:58:52 INFO BlockManagerMaster: Registered BlockManager
15/03/06 18:58:52 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#MASTER_IP:49407/user/HeartbeatReceiver
15/03/06 18:58:52 INFO CoarseGrainedExecutorBackend: Got assigned task 0
15/03/06 18:58:52 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/06 18:58:52 INFO Executor: Fetching http://MASTER_IP:57850/jars/trial_2.11-0.9.jar with timestamp 1425665154479
15/03/06 18:58:52 INFO Utils: Fetching http://MASTER_IP:57850/jars/trial_2.11-0.9.jar to /home/szymon/spark/slaveData/spark-b09c3727-8559-4ab8-ab32-1f5ecf7aeaf2/spark-0c892a4d-c8b9-4144-a259-8077f5316b52/spark-411cd372-224e-44c1-84ab-b0c3984a6361/fetchFileTemp7857926599487994869.tmp
15/03/06 18:58:52 INFO Utils: Copying /home/szymon/spark/slaveData/spark-b09c3727-8559-4ab8-ab32-1f5ecf7aeaf2/spark-0c892a4d-c8b9-4144-a259-8077f5316b52/spark-411cd372-224e-44c1-84ab-b0c3984a6361/-19284804851425665154479_cache to /home/szymon/spark/work/app-20150306190555-0000/0/./trial_2.11-0.9.jar
15/03/06 18:58:52 INFO Executor: Adding file:/home/szymon/spark/work/app-20150306190555-0000/0/./trial_2.11-0.9.jar to class loader
15/03/06 18:58:52 INFO TorrentBroadcast: Started reading broadcast variable 0
15/03/06 18:58:52 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to localhost/127.0.0.1:56545
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:87)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:593)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:593)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:587)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:126)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply(TorrentBroadcast.scala:136)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:56545
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
15/03/06 18:58:52 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
15/03/06 18:58:57 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks (after 1 retries)
java.io.IOException: Failed to connect to localhost/127.0.0.1:56545
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:56545
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
.
.
.
15/03/06 19:00:22 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
15/03/06 19:00:24 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor#WORKER_MACHINE_IP:37114] -> [akka.tcp://sparkDriver#MASTER_IP:49407] disassociated! Shutting down.
15/03/06 19:00:24 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver#MASTER_IP:49407] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/03/06 19:00:24 INFO Worker: Asked to kill executor app-20150306190555-0000/0
15/03/06 19:00:24 INFO ExecutorRunner: Runner thread for executor app-20150306190555-0000/0 interrupted
15/03/06 19:00:24 INFO ExecutorRunner: Killing process!
15/03/06 19:00:25 INFO Worker: Executor app-20150306190555-0000/0 finished with state KILLED exitStatus 1
15/03/06 19:00:25 INFO Worker: Cleaning up local directories for application app-20150306190555-0000
15/03/06 19:00:25 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor#WORKER_MACHINE_IP:37114] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/03/06 19:00:25 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40WORKER_MACHINE_IP%3A45806-2#1549100100] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
There is a warning, which I believe is not the case at this issue
15/03/06 18:07:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
You can try to set conf.set("spark.driver.host",""), the client host is a host where you start spark-shell or other script.