I start my DataStax cassandra instance with Spark:
dse cassandra -k
I then run this program (from within Eclipse):
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object Start {
def main(args: Array[String]): Unit = {
println("***** 1 *****")
val sparkConf = new SparkConf().setAppName("Start").setMaster("spark://127.0.0.1:7077")
println("***** 2 *****")
val sparkContext = new SparkContext(sparkConf)
println("***** 3 *****")
}
}
And I get the following output
***** 1 *****
***** 2 *****
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/29 15:27:50 INFO SparkContext: Running Spark version 1.5.2
15/12/29 15:27:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/29 15:27:51 INFO SecurityManager: Changing view acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: Changing modify acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nayan); users with modify permissions: Set(nayan)
15/12/29 15:27:52 INFO Slf4jLogger: Slf4jLogger started
15/12/29 15:27:52 INFO Remoting: Starting remoting
15/12/29 15:27:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#10.0.1.88:55126]
15/12/29 15:27:53 INFO Utils: Successfully started service 'sparkDriver' on port 55126.
15/12/29 15:27:53 INFO SparkEnv: Registering MapOutputTracker
15/12/29 15:27:53 INFO SparkEnv: Registering BlockManagerMaster
15/12/29 15:27:53 INFO DiskBlockManager: Created local directory at /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/blockmgr-21a96671-c33e-498c-83a4-bb5c57edbbfb
15/12/29 15:27:53 INFO MemoryStore: MemoryStore started with capacity 983.1 MB
15/12/29 15:27:53 INFO HttpFileServer: HTTP File server directory is /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/spark-fce0a058-9264-4f2c-8220-c32d90f11bd8/httpd-2a0efcac-2426-49c5-982a-941cfbb48c88
15/12/29 15:27:53 INFO HttpServer: Starting HTTP Server
15/12/29 15:27:53 INFO Utils: Successfully started service 'HTTP file server' on port 55127.
15/12/29 15:27:53 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/29 15:27:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/12/29 15:27:53 INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
15/12/29 15:27:54 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/12/29 15:27:54 INFO AppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:7077...
15/12/29 15:27:54 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#127.0.0.1:7077] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
15/12/29 15:28:14 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1f22aef0 rejected from java.util.concurrent.ThreadPoolExecutor#176cb4af[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
So something is happening during the creation of the spark context.
When i look in $DSE_HOME/logs/spark, it is empty. Not sure where else to look.
It turns out that the problem was the spark library version AND the Scala version. DataStax was running Spark 1.4.1 and Scala 2.10.5, while my eclipse project was using 1.5.2 & 2.11.7 respectively.
Note that BOTH the Spark library and Scala appear to have to match. I tried other combinations, but it only worked when both matched.
I am getting pretty familiar with this part of your posted error:
WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://...
It can have numerous causes, pretty much all related to misconfigured IPs. First I would do whatever zero323 says, then here's my two cents: I have solved my own problems recently by using IP addresses, not hostnames, and the only config I use in a simple standalone cluster is SPARK_MASTER_IP.
SPARK_MASTER_IP in the $SPARK_HOME/conf/spark-env.sh on your master then should lead the master webui to show the IP address you set:
spark://your.ip.address.numbers:7077
And your SparkConf setup can refer to that.
Having said that, I am not familiar with your specific implementation but I notice in the error two occurrences containing:
/private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/
Have you looked there to see if there's a logs directory? Is that where $DSE_HOME points? Alternatively connect to the driver where it creates it's webui:
INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
and you should see a link to an error log there somewhere.
More on the IP vs. hostname thing, this very old bug is marked as Resolved but I have not figured out what they mean by Resolved, so I just tend toward IP addresses.
Related
I am working with the following docker-compose image to build a spark standalone cluster:
---
# ----------------------------------------------------------------------------------------
# -- Docs: https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker --
# ----------------------------------------------------------------------------------------
version: "3.6"
volumes:
shared-workspace:
name: "hadoop-distributed-file-system"
driver: local
services:
jupyterlab:
image: andreper/jupyterlab:3.0.0-spark-3.0.0
container_name: jupyterlab
ports:
- 8888:8888
- 4040:4040
volumes:
- shared-workspace:/opt/workspace
spark-master:
image: andreper/spark-master:3.0.0
container_name: spark-master
ports:
- 8080:8080
- 7077:7077
volumes:
- shared-workspace:/opt/workspace
spark-worker-1:
image: andreper/spark-worker:3.0.0
container_name: spark-worker-1
environment:
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=512m
ports:
- 8081:8081
volumes:
- shared-workspace:/opt/workspace
depends_on:
- spark-master
spark-worker-2:
image: andreper/spark-worker:3.0.0
container_name: spark-worker-2
environment:
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=512m
ports:
- 8082:8081
volumes:
- shared-workspace:/opt/workspace
depends_on:
- spark-master
I followed this guide: https://towardsdatascience.com/apache-spark-cluster-on-docker-ft-a-juyterlab-interface-418383c95445.
Here can be found the Github repo: https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker
I can run the cluster and I can run code inside of the jupyter container, connecting to the master spark node without problems.
The problem starts when I want to run the spark code with spark submit. I really cannot understand how the cluster works. When I run inside the Jupyter container, I can quickly see where the scripts I create are, but I can't find them in the spark master container. If I check the docker-compose.yml, the volumes indicates that the folder where the scripts are stored is:
volumes:
- shared-workspace:/opt/workspace
But I cannot find this folder in any of the spark containers.
When I run, spark submit, I run it once I have executed inside of the Jupyter container. In the Jupyter container I have all the scripts that I am working with, but I have the doubt when I write the following command: spark submit --master spark:// spark-master:7077 <PATH to my python script>, the path of the python script, is the path where the script in Jupyter container or spark master container?
I can run the spark submit command without specifying the master, then it runs locally, and it runs without problems inside of the Jupyter container.
This is the python code I am executing:
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
from os.path import expanduser, join, abspath
sparkConf = SparkConf()
sparkConf.setMaster("spark://spark-master:7077")
sparkConf.setAppName("pyspark-4")
sparkConf.set("spark.executor.memory", "2g")
sparkConf.set("spark.driver.memory", "2g")
sparkConf.set("spark.executor.cores", "1")
sparkConf.set("spark.driver.cores", "1")
sparkConf.set("spark.dynamicAllocation.enabled", "false")
sparkConf.set("spark.shuffle.service.enabled", "false")
sparkConf.set("spark.sql.warehouse.dir", warehouse_location)
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext
df = spark.createDataFrame(
[
(1, "foo"), # create your data here, be consistent in the types.
(2, "bar"),
],
["id", "label"], # add your column names here
)
print(df.show())
But when I specify the master= --master spark:// spark-master: 7077, and specifying the path where the script lives in the jupyter container:
spark-submit --master spark://spark-master:7077 test.py
ant this are the logs I receive:
21/06/06 21:32:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/06/06 21:32:08 INFO SparkContext: Running Spark version 3.0.0
21/06/06 21:32:09 INFO ResourceUtils: ==============================================================
21/06/06 21:32:09 INFO ResourceUtils: Resources for spark.driver:
21/06/06 21:32:09 INFO ResourceUtils: ==============================================================
21/06/06 21:32:09 INFO SparkContext: Submitted application: pyspark-4
21/06/06 21:32:09 INFO SecurityManager: Changing view acls to: root
21/06/06 21:32:09 INFO SecurityManager: Changing modify acls to: root
21/06/06 21:32:09 INFO SecurityManager: Changing view acls groups to:
21/06/06 21:32:09 INFO SecurityManager: Changing modify acls groups to:
21/06/06 21:32:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/06/06 21:32:12 INFO Utils: Successfully started service 'sparkDriver' on port 45627.
21/06/06 21:32:12 INFO SparkEnv: Registering MapOutputTracker
21/06/06 21:32:13 INFO SparkEnv: Registering BlockManagerMaster
21/06/06 21:32:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/06/06 21:32:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/06/06 21:32:13 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/06/06 21:32:13 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5a81855c-3160-49a5-b9f9-9cdfe6e5ca62
21/06/06 21:32:14 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
21/06/06 21:32:14 INFO SparkEnv: Registering OutputCommitCoordinator
21/06/06 21:32:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/06/06 21:32:16 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://3b232f9ed93b:4040
21/06/06 21:32:19 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
21/06/06 21:32:20 INFO TransportClientFactory: Successfully created connection to spark-master/172.21.0.5:7077 after 284 ms (0 ms spent in bootstraps)
21/06/06 21:32:23 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20210606213223-0000
21/06/06 21:32:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46539.
21/06/06 21:32:23 INFO NettyBlockTransferService: Server created on 3b232f9ed93b:46539
21/06/06 21:32:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/06/06 21:32:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 3b232f9ed93b, 46539, None)
21/06/06 21:32:23 INFO BlockManagerMasterEndpoint: Registering block manager 3b232f9ed93b:46539 with 366.3 MiB RAM, BlockManagerId(driver, 3b232f9ed93b, 46539, None)
21/06/06 21:32:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 3b232f9ed93b, 46539, None)
21/06/06 21:32:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 3b232f9ed93b, 46539, None)
21/06/06 21:32:25 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
21/06/06 21:32:29 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('/opt/workspace/spark-warehouse').
21/06/06 21:32:29 INFO SharedState: Warehouse path is '/opt/workspace/spark-warehouse'.
ESTOY AQUI¿¿
21/06/06 21:33:09 INFO CodeGenerator: Code generated in 1925.0009 ms
21/06/06 21:33:09 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
21/06/06 21:33:09 INFO DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
21/06/06 21:33:09 INFO DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0)
21/06/06 21:33:09 INFO DAGScheduler: Parents of final stage: List()
21/06/06 21:33:09 INFO DAGScheduler: Missing parents: List()
21/06/06 21:33:10 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
21/06/06 21:33:10 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.3 KiB, free 366.3 MiB)
21/06/06 21:33:11 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.9 KiB, free 366.3 MiB)
21/06/06 21:33:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 3b232f9ed93b:46539 (size: 5.9 KiB, free: 366.3 MiB)
21/06/06 21:33:11 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1200
21/06/06 21:33:11 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
21/06/06 21:33:11 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
21/06/06 21:33:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/06/06 21:33:41 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/06/06 21:33:56 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/06/06 21:34:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/06/06 21:34:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
When I execute the same code, inside of a jupyter notebook, it works without problems.
It is because the path that I have to indicate for the script, is the path where the script lives in the spark-master node? or I am confounding things here
I use
docker pull bitnami/spark
https://hub.docker.com/r/bitnami/spark
I have this error when I run my Spark scripts with version 1.6 of Spark.
My scripts are working with version 1.5.
Java version: 1.8
scala version: 2.11.7
I tried to change the system env variable JAVA_OPTS=-Xms128m -Xmx512m many times, with different values of Xms and Xmx but it didn't change anything ...
I also tried to modify the memory settings of Intellij
help/change memory settings...
file/settings/scal compiler...
Nothing worked.
I have different users in the computer, and Java is setup at the root of the computer while intellij is setup in the folder of one of the users. Can it have an impact?
Here are the logs of the error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/30 17:06:54 INFO SparkContext: Running Spark version 1.6.0
20/04/30 17:06:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/30 17:06:55 INFO SecurityManager: Changing view acls to:
20/04/30 17:06:55 INFO SecurityManager: Changing modify acls to:
20/04/30 17:06:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(); users with modify permissions: Set()
20/04/30 17:06:56 INFO Utils: Successfully started service 'sparkDriver' on port 57698.
20/04/30 17:06:57 INFO Slf4jLogger: Slf4jLogger started
20/04/30 17:06:57 INFO Remoting: Starting remoting
20/04/30 17:06:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.1.5.175:57711]
20/04/30 17:06:57 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 57711.
20/04/30 17:06:57 INFO SparkEnv: Registering MapOutputTracker
20/04/30 17:06:57 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
at batch.BatchJob$.main(BatchJob.scala:23)
at batch.BatchJob.main(BatchJob.scala)
20/04/30 17:06:57 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
at batch.BatchJob$.main(BatchJob.scala:23)
at batch.BatchJob.main(BatchJob.scala)
And the beginning of the code:
package batch
import java.lang.management.ManagementFactory
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.{SaveMode, SQLContext}
object BatchJob {
def main (args: Array[String]): Unit = {
// get spark configuration
val conf = new SparkConf()
.setAppName("Lambda with Spark")
// Check if running from IDE
if (ManagementFactory.getRuntimeMXBean.getInputArguments.toString.contains("IntelliJ IDEA")) {
System.setProperty("hadoop.home.dir", "C:\\Libraries\\WinUtils") // required for winutils
conf.setMaster("local[*]")
}
// setup spark context
val sc = new SparkContext(conf)
implicit val sqlContext = new SQLContext(sc)
...
Finally could find a solution:
add -Xms2g -Xmx4g in VM options directly in Intellij Scala Console.
That's the only thing that worked for me
Spark 2.1.1 built for Hadoop 2.7.3
Scala 2.11.11
Cluster has 3 Linux RHEL 7.3 Azure VM's, running Spark Standalone Deploy Mode (no YARN or Mesos, yet)
I have created a very simple SparkStreaming job using IntelliJ, written in Scala. I'm using Maven and building the job into a fat/uber jar that contains all dependencies.
When I run the job locally it works fine. If I copy the jar to the cluster and run it with a master of local[2] it also works fine. However, if I submit the job to the cluster master it's like it does not want to schedule additional work beyond the first task. The job starts up, grabs however many events are in the Azure Event Hub, processes them successfully, then never does anymore work. It does not matter if I submit the job to the master as just an application or if it's submitted using supervised cluster mode, both do the same thing.
I've looked through all the logs I know of (master, driver (where applicable), and executor) and I am not seeing any errors or warnings that seem actionable. I've altered the log level, shown below, to show ALL/INFO/DEBUG and sifted through those logs without finding anything that seems relevant.
It may be worth noting that I had previously created several jobs that connect to Kafka, instead of the Azure Event Hub, using Java and those jobs run in supervised cluster mode without an issue on this same cluster. This leads me to believe that the cluster configuration isn't an issue, it's either something with my code (below) or the Azure Event Hub.
Any thoughts on where I might check next to isolate this issue? Here is the code for my simple job.
Thanks in advance.
Note: conf.{name} indicates values I'm loading from a config file. I've tested loading and hard-coding them, both with the same result.
package streamingJob
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.eventhubs.EventHubsUtils
import org.joda.time.DateTime
object TestJob {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
sparkConf.setAppName("TestJob")
// Uncomment to run locally
//sparkConf.setMaster("local[2]")
val sparkContext = new SparkContext(sparkConf)
sparkContext.setLogLevel("ERROR")
val streamingContext: StreamingContext = new StreamingContext(sparkContext, Seconds(1))
val readerParams = Map[String, String] (
"eventhubs.policyname" -> conf.policyname,
"eventhubs.policykey" -> conf.policykey,
"eventhubs.namespace" -> conf.namespace,
"eventhubs.name" -> conf.name,
"eventhubs.partition.count" -> conf.partitionCount,
"eventhubs.consumergroup" -> conf.consumergroup
)
val eventData = EventHubsUtils.createDirectStreams(
streamingContext,
conf.namespace,
conf.progressdir,
Map("name" -> readerParams))
eventData.foreachRDD(r => {
r.foreachPartition { p => {
p.foreach(d => {
println(DateTime.now() + ": " + d)
}) // end of EventData
}} // foreachPartition
}) // foreachRDD
streamingContext.start()
streamingContext.awaitTermination()
}
}
Here is a set of logs from when I run this as an application, not cluster/supervised.
/spark/bin/spark-submit --class streamingJob.TestJob --master spark://{ip}:7077 --total-executor-cores 1 /spark/job-files/fatjar.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/06 17:52:04 INFO SparkContext: Running Spark version 2.1.1
17/11/06 17:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/06 17:52:05 INFO SecurityManager: Changing view acls to: root
17/11/06 17:52:05 INFO SecurityManager: Changing modify acls to: root
17/11/06 17:52:05 INFO SecurityManager: Changing view acls groups to:
17/11/06 17:52:05 INFO SecurityManager: Changing modify acls groups to:
17/11/06 17:52:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/11/06 17:52:06 INFO Utils: Successfully started service 'sparkDriver' on port 44384.
17/11/06 17:52:06 INFO SparkEnv: Registering MapOutputTracker
17/11/06 17:52:06 INFO SparkEnv: Registering BlockManagerMaster
17/11/06 17:52:06 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/06 17:52:06 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/06 17:52:06 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b5e2c0f3-2500-42c6-b057-cf5d368580ab
17/11/06 17:52:06 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/11/06 17:52:06 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/06 17:52:06 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/06 17:52:06 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://{ip}:4040
17/11/06 17:52:06 INFO SparkContext: Added JAR file:/spark/job-files/fatjar.jar at spark://{ip}:44384/jars/fatjar.jar with timestamp 1509990726989
17/11/06 17:52:07 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://{ip}:7077...
17/11/06 17:52:07 INFO TransportClientFactory: Successfully created connection to /{ip}:7077 after 72 ms (0 ms spent in bootstraps)
17/11/06 17:52:07 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20171106175207-0000
17/11/06 17:52:07 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44624.
17/11/06 17:52:07 INFO NettyBlockTransferService: Server created on {ip}:44624
17/11/06 17:52:07 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/11/06 17:52:07 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171106175207-0000/0 on worker-20171106173151-{ip}-46086 ({ip}:46086) with 1 cores
17/11/06 17:52:07 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO StandaloneSchedulerBackend: Granted executor ID app-20171106175207-0000/0 on hostPort {ip}:46086 with 1 cores, 1024.0 MB RAM
17/11/06 17:52:07 INFO BlockManagerMasterEndpoint: Registering block manager {ip}:44624 with 366.3 MB RAM, BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, {ip}, 44624, None)
17/11/06 17:52:07 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171106175207-0000/0 is now RUNNING
17/11/06 17:52:08 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Hello everybody, we have a kerberized HDP (Hortonworks) cluster, we can run Spark jobs from Spark-Submit (CLI), Talend Big Data, but not from Eclipse.
We have a Windows client machine where Eclipse is installed and MIT windows Kerberos Client is confgiured (TGT Configuration). The goal is to run Spark job using eclipse. Portion of the java code related with Spark is operational and tested via CLI. Below is mentioned part of the code for the job.
private void setConfigurationProperties()
{
try{
sConfig.setAppName("abcd-name");
sConfig.setMaster("yarn-client");
sConfig.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sConfig.set("spark.hadoop.yarn.resourcemanager.address", "rs.abcd.com:8032"); sConfig.set("spark.hadoop.yarn.resourcemanager.scheduler.address","rs.abcd.com:8030");
sConfig.set("spark.hadoop.mapreduce.jobhistory.address","rs.abcd.com:10020");
sConfig.set("spark.hadoop.yarn.app.mapreduce.am.staging-dir", "/dir");
sConfig.set("spark.executor.memory", "2g");
sConfig.set("spark.executor.cores", "4");
sConfig.set("spark.executor.instances", "24");
sConfig.set("spark.yarn.am.cores", "24");
sConfig.set("spark.yarn.am.memory", "16g");
sConfig.set("spark.eventLog.enabled", "true");
sConfig.set("spark.eventLog.dir", "hdfs:///spark-history");
sConfig.set("spark.shuffle.memoryFraction", "0.4");
sConfig.set("spark.hadoop." + "mapreduce.application.framework.path","/hdp/apps/version/mapreduce/mapreduce.tar.gz#mr-framework");
sConfig.set("spark.local.dir", "/tmp");
sConfig.set("spark.hadoop.yarn.resourcemanager.principal", "rm/_HOST#ABCD.COM");
sConfig.set("spark.hadoop.mapreduce.jobhistory.principal", "jhs/_HOST#ABCD.COM");
sConfig.set("spark.hadoop.dfs.namenode.kerberos.principal", "nn/_HOST#ABCD.COM");
sConfig.set("spark.hadoop.fs.defaultFS", "hdfs://hdfs.abcd.com:8020");
sConfig.set("spark.hadoop.dfs.client.use.datanode.hostname", "true"); }
}
When we run the code the following error pops up:
17/04/05 23:37:06 INFO Remoting: Starting remoting
17/04/05 23:37:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#1.1.1.1:54356]
17/04/05 23:37:06 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54356.
17/04/05 23:37:06 INFO SparkEnv: Registering MapOutputTracker
17/04/05 23:37:06 INFO SparkEnv: Registering BlockManagerMaster
17/04/05 23:37:06 INFO DiskBlockManager: Created local directory at C:\tmp\blockmgr-baee2441-1977-4410-b52f-4275ff35d6c1
17/04/05 23:37:06 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
17/04/05 23:37:06 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/05 23:37:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/04/05 23:37:07 INFO SparkUI: Started SparkUI at http://1.1.1.1:4040
17/04/05 23:37:07 INFO RMProxy: Connecting to ResourceManager at rs.abcd.com/1.1.1.1:8032
17/04/05 23:37:07 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
17/04/05 23:37:07 INFO SparkUI: Stopped Spark web UI at http://1.1.1.1:4040
Please guide us how to specify in java code Kerberos authentication method instead of SIMPLE. Or how to instruct the client for Kerberos authentication request. And whole what should the process look like and what would be the right approach
Thank you
I want to use machine A where I will submit my Spark job to the cluster, A has no spark environment, just java. When I launch the jar, there is a HTTP server starts:
[steven#bj-230 ~]$ java -jar helloCluster.jar SimplyApp
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/06/10 16:54:54 INFO SparkEnv: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/06/10 16:54:54 INFO SparkEnv: Registering BlockManagerMaster
14/06/10 16:54:54 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140610165454-4393
14/06/10 16:54:54 INFO MemoryStore: MemoryStore started with capacity 1055.1 MB.
14/06/10 16:54:54 INFO ConnectionManager: Bound socket to port 59981 with id = ConnectionManagerId(bj-230,59981)
14/06/10 16:54:54 INFO BlockManagerMaster: Trying to register BlockManager
14/06/10 16:54:54 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager bj-230:59981 with 1055.1 MB RAM
14/06/10 16:54:54 INFO BlockManagerMaster: Registered BlockManager
14/06/10 16:54:54 INFO HttpServer: Starting HTTP Server
14/06/10 16:54:54 INFO HttpBroadcast: Broadcast server started at http://10.10.10.230:59233
14/06/10 16:54:54 INFO SparkEnv: Registering MapOutputTracker
14/06/10 16:54:54 INFO HttpFileServer: HTTP File server directory is /tmp/spark-bfdd02f1-3c02-4233-854f-af89542b9acf
14/06/10 16:54:54 INFO HttpServer: Starting HTTP Server
14/06/10 16:54:54 INFO SparkUI: Started Spark Web UI at http://bj-230:4040
14/06/10 16:54:54 INFO SparkContext: Added JAR hdfs://master:8020/tmp/helloCluster.jar at hdfs://master:8020/tmp/helloCluster.jar with timestamp 1402390494838
14/06/10 16:54:54 INFO AppClient$ClientActor: Connecting to master spark://master:7077...
So, what's the meaning of this server? And if I am behind a NAT, is it possible to use this machine A to submit my job to remote cluster?
By the way, the result of this execution is failed. Error log:
14/06/10 16:55:05 INFO SparkDeploySchedulerBackend: Executor app-20140610165321-0005/7 removed: Command exited with code 1
14/06/10 16:55:05 ERROR AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/06/10 16:55:05 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
14/06/10 16:55:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
The spark driver starts few HTTP endpoints:
It provides a Web console that shows the job progress. This http endpoint has a default port of 4040 and can be changed with the configuration option: spark.ui.port. Then, you connect to it with your browser: http://your_host:4040 and you will be able to follow the job. It's only alive the time the driver runs.
There's an additional HTTP endpoint to provide a file download service for the jars declared as dependencies. The workers will contact the driver to download the list of dependencies. This is a random assigned port. Therefore, the driver must be on a routable network from the Spark workers.