Folks!
I am attempting to move Spark processing into cluster (Standalone). Previously, jobs were running successfully over cluster set up with 1 Worker node + Master on the same machine. For example one of the jobs was running without any issues on 4gb and 2 cores. The same with local[*] mode.
Now I set up the cluster in Kubernetes with 24.0 GB RAM and 6 cores - 3 Workers + Master. And getting errors. For that simple job I am using all of the resources available in cluster.
Spark 2.2.0,Client mode.
spark-submit \
--name RSS_Analysys\
--class org.MainClass \
--conf "spark.driver.extraJavaOptions=-Ddata=file:///aws/efs/data/"\
--conf spark.executor.memory=8g\
--conf spark.executor.cores=2\
--master spark://AWS-ELB:7077\
--deploy-mode client \
--packages com.squareup.okhttp:okhttp:2.7.5\
file:///aws/efs/app/rss_app.jar
Driver output:
17/07/20 16:42:13 INFO TaskSetManager: Finished task 199.0 in stage 95.0 (TID 4017) in 4 ms on 172.12.0.1 (executor 2) (200/200)
17/07/20 16:42:13 INFO TaskSchedulerImpl: Removed TaskSet 95.0, whose tasks have all completed, from pool
17/07/20 16:42:13 INFO DAGScheduler: ShuffleMapStage 95 (head at RSSProcessor.scala:76) finished in 0.670 s
17/07/20 16:42:13 INFO DAGScheduler: looking for newly runnable stages
17/07/20 16:42:13 INFO DAGScheduler: running: Set(ShuffleMapStage 96)
17/07/20 16:42:13 INFO DAGScheduler: waiting: Set(ResultStage 97)
17/07/20 16:42:13 INFO DAGScheduler: failed: Set()
17/07/20 16:42:25 WARN StandaloneAppClient$ClientEndpoint: Connection to 172.12.0.1:7077 failed; waiting for master to reconnect...
17/07/20 16:42:25 WARN StandaloneSchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
17/07/20 16:42:26 ERROR TaskSchedulerImpl: Lost executor 0 on 172.12.0.2: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/07/20 16:42:26 ERROR TaskSchedulerImpl: Lost executor 1 on 172.12.0.3: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/07/20 16:42:26 INFO DAGScheduler: Executor lost: 0 (epoch 29)
17/07/20 16:42:26 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 from BlockManagerMaster.
17/07/20 16:42:26 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_9_0 !
17/07/20 16:42:26 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(0, 172.12.0.2, 42948, None)
17/07/20 16:42:26 INFO BlockManagerMaster: Removed 0 successfully in removeExecutor
17/07/20 16:42:26 INFO DAGScheduler: Shuffle files lost for executor: 0 (epoch 29)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 93 is now unavailable on executor 0 (124/200, false)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 95 is now unavailable on executor 0 (126/200, false)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 94 is now unavailable on executor 0 (0/1, false)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 92 is now unavailable on executor 0 (0/1, false)
17/07/20 16:42:26 INFO DAGScheduler: Executor lost: 1 (epoch 35)
17/07/20 16:42:26 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
17/07/20 16:42:26 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, 172.12.0.3, 37556, None)
17/07/20 16:42:26 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
17/07/20 16:42:26 INFO DAGScheduler: Shuffle files lost for executor: 1 (epoch 35)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 91 is now unavailable on executor 1 (0/1, false)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 93 is now unavailable on executor 1 (65/200, false)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 95 is now unavailable on executor 1 (41/200, false)
17/07/20 16:42:26 ERROR TaskSchedulerImpl: Lost executor 2 on 172.12.0.1: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/07/20 16:42:26 WARN TaskSetManager: Lost task 0.0 in stage 96.0 (TID 3617, 172.12.0.1, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/07/20 16:42:26 INFO DAGScheduler: Executor lost: 2 (epoch 41)
17/07/20 16:42:26 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/20 16:42:26 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, 172.12.0.1, 33264, None)
17/07/20 16:42:26 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
17/07/20 16:42:26 INFO DAGScheduler: Shuffle files lost for executor: 2 (epoch 41)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 93 is now unavailable on executor 2 (0/200, false)
17/07/20 16:42:26 INFO ShuffleMapStage: ShuffleMapStage 95 is now unavailable on executor 2 (0/200, false)
17/07/20 17:11:26 INFO BlockManagerInfo: Removed broadcast_50_piece0 on 172.12.0.4:36275 in memory (size: 12.1 KB, free: 366.2 MB)
17/07/20 17:11:26 INFO BlockManagerInfo: Removed broadcast_45_piece0 on 172.12.0.4:36275 in memory (size: 5.5 KB, free: 366.2 MB)
17/07/20 17:11:26 INFO BlockManagerInfo: Removed broadcast_49_piece0 on 172.12.0.4:36275 in memory (size: 12.1 KB, free: 366.2 MB)
Interesting, that 12 seconds passed before error (Maybe timeout settings required?)
17/07/20 16:42:13 INFO DAGScheduler: failed: Set()
17/07/20 16:42:25 WARN StandaloneAppClient$ClientEndpoint: Connection to 172.12.0.1:7077 failed; waiting for master to reconnect...
Before, I see that some stages are passed.
Any help or advise are highly appreciated.
Related
val conf = new SparkConf()
.setAppName("example")
.setMaster("local[*]")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("setWarnUnregisteredClasses","true")
When registrationRequired is set to true, it throws exception for class Person is not registered and also "org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage" is not registered
So, now in default it is false, so making setWarnUnregisteredClasses to true, it should show warning message for unregistered class encountered as provided in the documentation https://github.com/EsotericSoftware/kryo#serializer-framework? But, nothing is shown in the logs regarding serialization.
What I am trying to do is to get a list of all unregistered class into my logs by setting this property .set("setWarnUnregisteredClasses","true")
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/10 15:56:09 WARN Utils: Your hostname, knoldus-Vostro-3546 resolves to a loopback address: 127.0.1.1; using 192.168.1.113 instead (on interface enp7s0)
19/12/10 15:56:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/12/10 15:56:10 INFO SparkContext: Running Spark version 2.4.4
19/12/10 15:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/10 15:56:12 INFO SparkContext: Submitted application: kyroExample
19/12/10 15:56:14 INFO SecurityManager: Changing view acls to: knoldus
19/12/10 15:56:14 INFO SecurityManager: Changing modify acls to: knoldus
19/12/10 15:56:14 INFO SecurityManager: Changing view acls groups to:
19/12/10 15:56:14 INFO SecurityManager: Changing modify acls groups to:
19/12/10 15:56:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(knoldus); groups with view permissions: Set(); users with modify permissions: Set(knoldus); groups with modify permissions: Set()
19/12/10 15:56:17 INFO Utils: Successfully started service 'sparkDriver' on port 36235.
19/12/10 15:56:17 INFO SparkEnv: Registering MapOutputTracker
19/12/10 15:56:18 INFO SparkEnv: Registering BlockManagerMaster
19/12/10 15:56:18 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/10 15:56:18 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/10 15:56:18 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-956a186e-cfbd-4ad2-b531-9f46bff96984
19/12/10 15:56:18 INFO MemoryStore: MemoryStore started with capacity 870.9 MB
19/12/10 15:56:18 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/10 15:56:19 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/12/10 15:56:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.113:4040
19/12/10 15:56:19 INFO Executor: Starting executor ID driver on host localhost
19/12/10 15:56:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41737.
19/12/10 15:56:19 INFO NettyBlockTransferService: Server created on 192.168.1.113:41737
19/12/10 15:56:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/10 15:56:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.113, 41737, None)
19/12/10 15:56:19 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.113:41737 with 870.9 MB RAM, BlockManagerId(driver, 192.168.1.113, 41737, None)
19/12/10 15:56:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.113, 41737, None)
19/12/10 15:56:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.113, 41737, None)
19/12/10 15:56:21 INFO SparkContext: Starting job: take at KyroExample.scala:28
19/12/10 15:56:21 INFO DAGScheduler: Got job 0 (take at KyroExample.scala:28) with 1 output partitions
19/12/10 15:56:21 INFO DAGScheduler: Final stage: ResultStage 0 (take at KyroExample.scala:28)
19/12/10 15:56:21 INFO DAGScheduler: Parents of final stage: List()
19/12/10 15:56:21 INFO DAGScheduler: Missing parents: List()
19/12/10 15:56:21 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at filter at KyroExample.scala:24), which has no missing parents
19/12/10 15:56:21 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.0 KB, free 870.9 MB)
19/12/10 15:56:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1730.0 B, free 870.9 MB)
19/12/10 15:56:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.113:41737 (size: 1730.0 B, free: 870.9 MB)
19/12/10 15:56:22 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/12/10 15:56:22 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at filter at KyroExample.scala:24) (first 15 tasks are for partitions Vector(0))
19/12/10 15:56:22 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
19/12/10 15:56:22 WARN TaskSetManager: Stage 0 contains a task of very large size (243 KB). The maximum recommended task size is 100 KB.
19/12/10 15:56:22 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 249045 bytes)
19/12/10 15:56:22 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
19/12/10 15:56:23 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 293.3 KB, free 870.6 MB)
19/12/10 15:56:23 INFO BlockManagerInfo: Added rdd_1_0 in memory on 192.168.1.113:41737 (size: 293.3 KB, free: 870.6 MB)
19/12/10 15:56:23 INFO Executor: 1 block locks were not released by TID = 0:
[rdd_1_0]
19/12/10 15:56:23 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1132 bytes result sent to driver
19/12/10 15:56:23 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 924 ms on localhost (executor driver) (1/1)
19/12/10 15:56:23 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/12/10 15:56:23 INFO DAGScheduler: ResultStage 0 (take at KyroExample.scala:28) finished in 1.733 s
19/12/10 15:56:23 INFO DAGScheduler: Job 0 finished: take at KyroExample.scala:28, took 1.895530 s
There is no unregistered class encountered logs. Why?
I had the same problem.
The issue is that setWarnUnregisteredClasses is a Kryo configuration that currently (I actually use Spark 2.4.4) is not exposed through Spark.
You have to set the specific configuration in Kryo.
The workaround I used was to have a custom KryoRegistrator.
Then I used it in this way:
class MyKryoRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo): Unit = {
kryo.setRegistrationRequired(false)
kryo.setWarnUnregisteredClasses(true)
...
You are using kryo registration so custom and other classes need to be registered with kryo and also both classes should implement serialize interface.
setWarnUnregisteredClasses will give warnings and conf.set("spark.kryo.registrationRequired", "true") throws exception for classes not registered.
You have to register person and TaskCommitMessage like
conf.registerKryoClasses(Array(classOf[Person]))
I am getting an error while trying to read data from a collection.
My MongoDB instance is hosted in 192.168.1.2 while my spark instance is hosted in 1.1. The code is :
package org.sparkexample;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
import org.bson.Document;
import com.mongodb.spark.MongoSpark;
import com.mongodb.spark.rdd.api.java.JavaMongoRDD;
public class WordCountTask {
public static void main(String[] args) {
System.out.println("arg : " + args[0]);
//checkArgument(args.length > 1, "Please provide the path of input file as first parameter.");
new WordCountTask().run(args[0]);
}
public void run(String inputFilePath) {
SparkSession spark = SparkSession.builder()
.master("spark://192.168.1.1:7077")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://192.168.1.2/local.Test")
.config("spark.mongodb.output.uri", "mongodb://192.168.1.2/local.Test")
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaMongoRDD<Document> rdd = MongoSpark.load(jsc);
System.out.println("******************************************");
System.out.println("The count is : ");
System.out.println(rdd.count());
System.out.println(rdd.first().toJson());
System.out.println("******************************************");
jsc.close();
}
}
The error(or rather info) obtained is:
INFO MongoSamplePartitioner: Could not find collection (Test),
using a single partition
Due to the above, the .first() command errors out. However, the collection does exists and I am able to access it. Can anyone let me know whats going wrong?
The full log is:
; ui acls disabled; users with view permissions: Set(mklrjv); groups with view
permissions: Set(); users with modify permissions: Set(mklrjv); groups with m
odify permissions: Set()
17/04/10 18:17:09 INFO Utils: Successfully started service 'sparkDriver' on port
34048.
17/04/10 18:17:09 INFO SparkEnv: Registering MapOutputTracker
17/04/10 18:17:09 INFO SparkEnv: Registering BlockManagerMaster
17/04/10 18:17:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storag
e.DefaultTopologyMapper for getting topology information
17/04/10 18:17:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/04/10 18:17:09 INFO DiskBlockManager: Created local directory at C:\Users\mra
jeev\AppData\Local\Temp\blockmgr-17cba028-2757-4f48-88ea-f8c7b33ccba9
17/04/10 18:17:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/04/10 18:17:09 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/10 18:17:09 INFO Utils: Successfully started service 'SparkUI' on port 404
0.
17/04/10 18:17:09 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://
192.168.1.1:4040
17/04/10 18:17:09 INFO SparkContext: Added JAR file:/C:/Projects/SparkJava/targe
t/uber-first-example-1.0-SNAPSHOT.jar at spark://192.168.1.1:34048/jars/uber-f
irst-example-1.0-SNAPSHOT.jar with timestamp 1491828429769
17/04/10 18:17:09 INFO StandaloneAppClient$ClientEndpoint: Connecting to master
spark://192.168.1.1:7077...
17/04/10 18:17:10 INFO TransportClientFactory: Successfully created connection t
o /192.168.1.1:7077 after 55 ms (0 ms spent in bootstraps)
17/04/10 18:17:10 INFO StandaloneSchedulerBackend: Connected to Spark cluster wi
th app ID app-20170410181710-0013
17/04/10 18:17:10 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-2
0170410181710-0013/0 on worker-20170410150028-192.168.1.1-33151 (192.168.1.1
:33151) with 4 cores
17/04/10 18:17:10 INFO StandaloneSchedulerBackend: Granted executor ID app-20170
410181710-0013/0 on hostPort 192.168.1.1:33151 with 4 cores, 1024.0 MB RAM
17/04/10 18:17:10 INFO Utils: Successfully started service 'org.apache.spark.net
work.netty.NettyBlockTransferService' on port 34070.
17/04/10 18:17:10 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app
-20170410181710-0013/0 is now RUNNING
17/04/10 18:17:10 INFO NettyBlockTransferService: Server created on 10.78.130.13
4:34070
17/04/10 18:17:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockR
eplicationPolicy for block replication policy
17/04/10 18:17:10 INFO BlockManagerMaster: Registering BlockManager BlockManager
Id(driver, 192.168.1.1, 34070, None)
17/04/10 18:17:10 INFO BlockManagerMasterEndpoint: Registering block manager 10.
78.130.134:34070 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.1, 34070,
None)
17/04/10 18:17:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerI
d(driver, 192.168.1.1, 34070, None)
17/04/10 18:17:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(dr
iver, 192.168.1.1, 34070, None)
17/04/10 18:17:11 INFO EventLoggingListener: Logging events to file:/C:/tmp/spar
k-events/app-20170410181710-0013
17/04/10 18:17:11 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for
scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/04/10 18:17:11 INFO SharedState: Warehouse path is 'file:/C:/Projects/SparkJa
va/spark-warehouse/'.
17/04/10 18:17:12 WARN SparkSession$Builder: Using an existing SparkSession; som
e configuration may not take effect.
17/04/10 18:17:12 INFO MemoryStore: Block broadcast_0 stored as values in memory
(estimated size 216.0 B, free 366.3 MB)
17/04/10 18:17:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in
memory (estimated size 402.0 B, free 366.3 MB)
17/04/10 18:17:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 1
0.78.130.134:34070 (size: 402.0 B, free: 366.3 MB)
17/04/10 18:17:12 INFO SparkContext: Created broadcast 0 from broadcast at Mongo
Spark.scala:499
******************************************
The count is :
17/04/10 18:17:13 INFO cluster: Cluster created with settings {hosts=[10.78.130.
149:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30
000 ms', maxWaitQueueSize=500}
17/04/10 18:17:13 INFO cluster: Cluster description not yet available. Waiting f
or 30000 ms before timing out
17/04/10 18:17:13 INFO connection: Opened connection [connectionId{localValue:1,
serverValue:107}] to 192.168.1.2:27017
17/04/10 18:17:13 INFO cluster: Monitor thread successfully connected to server
with description ServerDescription{address=192.168.1.2:27017, type=STANDALONE,
state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWire
Version=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=117621
8}
17/04/10 18:17:13 INFO MongoClientCache: Creating MongoClient: [192.168.1.2:27
017]
17/04/10 18:17:13 INFO connection: Opened connection [connectionId{localValue:2,
serverValue:108}] to 192.168.1.2:27017
17/04/10 18:17:13 INFO MongoSamplePartitioner: Could not find collection (Test),
using a single partition
17/04/10 18:17:13 INFO SparkContext: Starting job: count at WordCountTask.java:3
1
17/04/10 18:17:13 INFO DAGScheduler: Got job 0 (count at WordCountTask.java:31)
with 1 output partitions
17/04/10 18:17:13 INFO DAGScheduler: Final stage: ResultStage 0 (count at WordCo
untTask.java:31)
17/04/10 18:17:13 INFO DAGScheduler: Parents of final stage: List()
17/04/10 18:17:13 INFO DAGScheduler: Missing parents: List()
17/04/10 18:17:13 INFO DAGScheduler: Submitting ResultStage 0 (MongoRDD[0] at RD
D at MongoRDD.scala:52), which has no missing parents
17/04/10 18:17:13 INFO MemoryStore: Block broadcast_1 stored as values in memory
(estimated size 3.0 KB, free 366.3 MB)
17/04/10 18:17:13 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in
memory (estimated size 1855.0 B, free 366.3 MB)
17/04/10 18:17:13 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 1
0.78.130.134:34070 (size: 1855.0 B, free: 366.3 MB)
17/04/10 18:17:13 INFO SparkContext: Created broadcast 1 from broadcast at DAGSc
heduler.scala:996
17/04/10 18:17:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage
0 (MongoRDD[0] at RDD at MongoRDD.scala:52)
17/04/10 18:17:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/04/10 18:17:15 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered
executor NettyRpcEndpointRef(null) (192.168.1.1:34090) with ID 0
17/04/10 18:17:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10
.78.130.134, executor 0, partition 0, ANY, 6112 bytes)
17/04/10 18:17:15 INFO BlockManagerMasterEndpoint: Registering block manager 10.
78.130.134:34108 with 366.3 MB RAM, BlockManagerId(0, 192.168.1.1, 34108, None
)
17/04/10 18:17:18 INFO MongoClientCache: Closing MongoClient: [192.168.1.2:270
17]
17/04/10 18:17:18 INFO connection: Closed connection [connectionId{localValue:2,
serverValue:108}] to 192.168.1.2:27017 because the pool has been closed.
17/04/10 18:17:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 1
0.78.130.134:34108 (size: 1855.0 B, free: 366.3 MB)
17/04/10 18:17:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 1
0.78.130.134:34108 (size: 402.0 B, free: 366.3 MB)
17/04/10 18:17:49 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in
34054 ms on 192.168.1.1 (executor 0) (1/1)
17/04/10 18:17:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have
all completed, from pool
17/04/10 18:17:49 INFO DAGScheduler: ResultStage 0 (count at WordCountTask.java:
31) finished in 35.409 s
17/04/10 18:17:49 INFO DAGScheduler: Job 0 finished: count at WordCountTask.java
:31, took 35.653876 s
0
17/04/10 18:17:49 INFO SparkContext: Starting job: first at WordCountTask.java:3
2
17/04/10 18:17:49 INFO DAGScheduler: Got job 1 (first at WordCountTask.java:32)
with 1 output partitions
17/04/10 18:17:49 INFO DAGScheduler: Final stage: ResultStage 1 (first at WordCo
untTask.java:32)
17/04/10 18:17:49 INFO DAGScheduler: Parents of final stage: List()
17/04/10 18:17:49 INFO DAGScheduler: Missing parents: List()
17/04/10 18:17:49 INFO DAGScheduler: Submitting ResultStage 1 (MongoRDD[0] at RD
D at MongoRDD.scala:52), which has no missing parents
17/04/10 18:17:49 INFO MemoryStore: Block broadcast_2 stored as values in memory
(estimated size 3.2 KB, free 366.3 MB)
17/04/10 18:17:49 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in
memory (estimated size 1926.0 B, free 366.3 MB)
17/04/10 18:17:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 1
0.78.130.134:34070 (size: 1926.0 B, free: 366.3 MB)
17/04/10 18:17:49 INFO SparkContext: Created broadcast 2 from broadcast at DAGSc
heduler.scala:996
17/04/10 18:17:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage
1 (MongoRDD[0] at RDD at MongoRDD.scala:52)
17/04/10 18:17:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/04/10 18:17:49 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 10
.78.130.134, executor 0, partition 0, ANY, 6194 bytes)
17/04/10 18:17:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 1
0.78.130.134:34108 (size: 1926.0 B, free: 366.3 MB)
17/04/10 18:17:49 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in
56 ms on 192.168.1.1 (executor 0) (1/1)
17/04/10 18:17:49 INFO DAGScheduler: ResultStage 1 (first at WordCountTask.java:
32) finished in 0.057 s
17/04/10 18:17:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have
all completed, from pool
17/04/10 18:17:49 INFO DAGScheduler: Job 1 finished: first at WordCountTask.java
:32, took 0.076634 s
Exception in thread "main" java.lang.UnsupportedOperationException: empty collec
tion
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1369)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s
cala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s
cala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.first(RDD.scala:1366)
at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:5
38)
at org.apache.spark.api.java.AbstractJavaRDDLike.first(JavaRDDLike.scala
:45)
at org.sparkexample.WordCountTask.run(WordCountTask.java:32)
at org.sparkexample.WordCountTask.main(WordCountTask.java:14)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:18
7)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/04/10 18:17:49 INFO SparkContext: Invoking stop() from shutdown hook
17/04/10 18:17:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.1:404
0
17/04/10 18:17:49 INFO StandaloneSchedulerBackend: Shutting down all executors
17/04/10 18:17:49 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each
executor to shut down
17/04/10 18:17:49 WARN TransportChannelHandler: Exception in connection from /10
.78.130.134:34132
java.io.IOException: An existing connection was forcibly closed by the remote ho
st
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirect
ByteBuf.java:221)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketCha
nnel.java:275)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:652)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
17/04/10 18:17:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEnd
point stopped!
17/04/10 18:17:49 WARN TransportChannelHandler: Exception in connection from /10
.78.130.134:34113
java.io.IOException: An existing connection was forcibly closed by the remote ho
st
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirect
ByteBuf.java:221)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketCha
nnel.java:275)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:652)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
17/04/10 18:17:49 WARN TransportChannelHandler: Exception in connection from /10
.78.130.134:34090
java.io.IOException: An existing connection was forcibly closed by the remote ho
st
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirect
ByteBuf.java:221)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketCha
nnel.java:275)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:652)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
17/04/10 18:17:49 INFO MemoryStore: MemoryStore cleared
17/04/10 18:17:49 INFO BlockManager: BlockManager stopped
17/04/10 18:17:49 INFO BlockManagerMaster: BlockManagerMaster stopped
17/04/10 18:17:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
17/04/10 18:17:49 INFO SparkContext: Successfully stopped SparkContext
17/04/10 18:17:49 INFO ShutdownHookManager: Shutdown hook called
17/04/10 18:17:49 INFO ShutdownHookManager: Deleting directory C:\Users\mklrjv\
AppData\Local\Temp\spark-3213c1b3-9a85-42b0-ba04-6e0e46a90d98
I am trying to submit the job using command
"spark-submit --class it.polimi.dice.spark.WordCount --master yarn-master --conf spark.cassandra.connection.host=10.0.0.5 --num-executors 1 --deploy-mode client --driver-memory 512m --executor-memory 512m /home/useruser/temp/spark-cassandra-example/target/scala-2.10/spark-cassandra-exmaple-assembly-1.0.jar"
But I am getting error although I have tried the same thing using spark-shell and it is working it means that version of the spark-Cassandra connector 1.6.0-M1(spark version is 1.6.0 , scala 2.10.5, cassandra 3.3.0) and other configurations are fine. Here is the result I am getting after using spark-submit command.
16/11/21 09:39:03 INFO Client: Application report for application_1479668866076_0014 (state: ACCEPTED)
16/11/21 09:39:04 INFO Client: Application report for application_1479668866076_0014 (state: ACCEPTED)
16/11/21 09:39:05 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/11/21 09:39:05 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox.hortonworks.com, PROXY_URI_BASES -> http://sandbox.hortonworks.com:8088/proxy/application_1479668866076_0014), /proxy/application_1479668866076_0014
16/11/21 09:39:05 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/11/21 09:39:05 INFO Client: Application report for application_1479668866076_0014 (state: RUNNING)
16/11/23 10:10:38 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
16/11/23 10:10:38 INFO Cluster: New Cassandra host /10.0.0.4:9042 added
16/11/23 10:10:38 INFO Cluster: New Cassandra host /10.0.0.5:9042 added
16/11/23 10:10:38 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/11/23 10:10:39 INFO SparkContext: Starting job: fold at WordCount.scala:17
16/11/23 10:10:39 INFO DAGScheduler: Got job 0 (fold at WordCount.scala:17) with 2 output partitions
16/11/23 10:10:39 INFO DAGScheduler: Final stage: ResultStage 0 (fold at WordCount.scala:17)
16/11/23 10:10:39 INFO DAGScheduler: Parents of final stage: List()
16/11/23 10:10:39 INFO DAGScheduler: Missing parents: List()
16/11/23 10:10:39 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[1] at RDD at CassandraRDD.scala:15), which has no missing parents
16/11/23 10:10:39 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.1 KB, free 7.1 KB)
16/11/23 10:10:39 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.7 KB, free 10.9 KB)
16/11/23 10:10:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.0.6:58236 (size: 3.7 KB, free: 143.6 MB)
16/11/23 10:10:39 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/11/23 10:10:39 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (CassandraTableScanRDD[1] at RDD at CassandraRDD.scala:15)
16/11/23 10:10:39 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
16/11/23 10:10:39 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sandbox.hortonworks.com, partition 0,RACK_LOCAL, 29218 bytes)
16/11/23 10:10:40 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox.hortonworks.com:51013 (size: 3.7 KB, free: 143.6 MB)
16/11/23 10:10:43 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sandbox.hortonworks.com, partition 1,RACK_LOCAL, 29156 bytes)
16/11/23 10:10:43 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, sandbox.hortonworks.com): java.io.IOException: Failed to open native connection to Cassandra at {10.0.0.4, 10.0.0.5}:9042
16/11/23 10:10:45 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, sandbox.hortonworks.com, partition 0,RACK_LOCAL, 29218 bytes)
16/11/23 10:10:45 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor sandbox.hortonworks.com: java.io.IOException (Failed to open native connection to Cassandra at {10.0.0.4, 10.0.0.5}:9042) [duplicate 1]
Can anyone kindly help me ho wi can fix this issue.
Here's what I'm trying to do:
using spark-submit to submit a packaged / compiled (using sbt 0.13.12) scala programm to my virtualized "cluster" running hdp 2.4 (Spark 1.6.0, Scala 2.10.5) using virtual box
using the --files option to copy a text file "foo.txt" (which is located in the project root) from the "submitting" Windows machine (which is also running Spark 1.6.0 and Scala 2.10.5) to the working directories of executors (as described by spark-submit -h)
passing the textfile as first argument to my application
finally: reading in the file and counting the lines
The command for submitting is
spark-submit ^
--class boern.spark.SparkMeApp ^
--master "spark://127.0.0.1:7077" ^
--files "foo.txt" ^
target/scala-2.11/sparkme-project_2.11-1.0.jar foo.txt
The interesting part of code is
val fileName = args(0)
println(s"argument 0 is $fileName")
val lines = sc.textFile(fileName).cache
val c = lines.count /** line 37 */
The error (short version) I'm getting is:
INFO DAGScheduler: Job 0 failed: count at SparkMeApp.scala:37, Exception, Job aborted: java
.io.FileNotFoundException: File file:/E:/myProject/foo.txt does not exist
After two days of a combination "bruteforcing" and reading documentation I am still lost... Am I wrong, that sc.textFile(fileName).cache is executed on the workers and everything which is not preceeded by sc on master? Is using SparkFiles the way to go?
Stacktrace
E:\myProject\>spark-submit --verbose --class boern.spark.SparkMeApp --master "spark://127.0.0.1:7077" --files "foo.txt" target/scala-2.11/sparkme-project_2.11-1.0.jar foo.txt
Using properties file: null
Parsed arguments:
master spark://127.0.0.1:7077
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile null
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files file:/E:/myProject/foo.txt
pyFiles null
archives null
mainClass boern.spark.SparkMeApp
primaryResource file:/E:/myProject/target/scala-2.11/sparkme-project_2.11-1.0.jar
name boern.spark.SparkMeApp
childArgs [foo.txt]
jars null
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file null:
Main class:
boern.spark.SparkMeApp
Arguments:
foo.txt
System properties:
SPARK_SUBMIT -> true
spark.files -> file:/E:/myProject/foo.txt
spark.app.name -> boern.spark.SparkMeApp
spark.jars -> file:/E:/myProject/target/scala-2.11/sparkme-project_2.11-1.0.jar
spark.submit.deployMode -> client
spark.master -> spark://127.0.0.1:7077
Classpath elements:
file:/E:/myProject/target/scala-2.11/sparkme-project_2.11-1.0.jar
Working directory is E:\myProject\sbtmanual
Files:
\CONF.ENI
\mw.csv
\mw_out.csv
\pagefile.sys
\temp.rds
args:
foo.txt
config set.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/09/15 14:36:21 INFO SparkContext: Running Spark version 1.6.0
16/09/15 14:36:22 INFO SecurityManager: Changing view acls to: Boern
16/09/15 14:36:22 INFO SecurityManager: Changing modify acls to: Boern
16/09/15 14:36:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Boern); users with modify permissions: Set(Boern)
16/09/15 14:36:22 INFO Utils: Successfully started service 'sparkDriver' on port 59716.
16/09/15 14:36:23 INFO Slf4jLogger: Slf4jLogger started
16/09/15 14:36:23 INFO Remoting: Starting remoting
16/09/15 14:36:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.56.1:59729]
16/09/15 14:36:23 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 59729.
16/09/15 14:36:23 INFO SparkEnv: Registering MapOutputTracker
16/09/15 14:36:23 INFO SparkEnv: Registering BlockManagerMaster
16/09/15 14:36:23 INFO DiskBlockManager: Created local directory at C:\Users\Boern\AppData\Local\Temp\blockmgr-c7ee2dab-ea00-4ae5-9f06-c6ab74f135e5
16/09/15 14:36:23 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/09/15 14:36:23 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/15 14:36:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
16/09/15 14:36:23 INFO Utils: Successfully started service 'SparkUI' on port 4041.
16/09/15 14:36:23 INFO SparkUI: Started SparkUI at http://192.168.56.1:4041
16/09/15 14:36:23 INFO HttpFileServer: HTTP File server directory is C:\Users\Boern\AppData\Local\Temp\spark-2736b20a-fc90-40e8-a7ad-2d8cac8001f2\httpd-14abb177-9801-403c-9df9-84afb2e87d70
16/09/15 14:36:23 INFO HttpServer: Starting HTTP Server
16/09/15 14:36:23 INFO Utils: Successfully started service 'HTTP file server' on port 59746.
16/09/15 14:36:23 INFO SparkContext: Added JAR file:/E:/myProject/target/scala-2.11/sparkme-project_2.11-1.0.jar at http://192.168.56.1:59746/jars/sparkme-project_2.11-1.0.jar with timestamp 1473942983631
16/09/15 14:36:23 INFO Utils: Copying E:\myProject\sbtmanual\foo.txt to C:\Users\Boern\AppData\Local\Temp\spark-2736b20a-fc90-40e8-a7ad-2d8cac8001f2\userFiles-7849db02-01ff-40ea-9250-62b87d854f4c\foo.txt
16/09/15 14:36:23 INFO SparkContext: Added file file:/E:/myProject/foo.txt at http://192.168.56.1:59746/files/foo.txt with timestamp 1473942983695
16/09/15 14:36:23 INFO AppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:7077...
16/09/15 14:36:34 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160915123633-0015
16/09/15 14:36:34 INFO AppClient$ClientEndpoint: Executor added: app-20160915123633-0015/0 on worker-20160915105800-10.0.2.15-44537 (10.0.2.15:44537) with 4 cores
16/09/15 14:36:34 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160915123633-0015/0 on hostPort 10.0.2.15:44537 with 4 cores, 1024.0 MB RAM
16/09/15 14:36:34 INFO AppClient$ClientEndpoint: Executor updated: app-20160915123633-0015/0 is now RUNNING
16/09/15 14:36:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59781.
16/09/15 14:36:34 INFO NettyBlockTransferService: Server created on 59781
16/09/15 14:36:34 INFO BlockManagerMaster: Trying to register BlockManager
16/09/15 14:36:34 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:59781 with 511.1 MB RAM, BlockManagerId(driver, 192.168.56.1, 59781)
16/09/15 14:36:34 INFO BlockManagerMaster: Registered BlockManager
16/09/15 14:36:34 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
sc set.
argument 0 is foo.txt
16/09/15 14:36:34 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 208.5 KB, free 208.5 KB)
16/09/15 14:36:34 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.3 KB, free 227.8 KB)
16/09/15 14:36:34 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.1:59781 (size: 19.3 KB, free: 511.1 MB)
16/09/15 14:36:34 INFO SparkContext: Created broadcast 0 from textFile at SparkMeApp.scala:39
16/09/15 14:36:34 INFO FileInputFormat: Total input paths to process : 1
16/09/15 14:36:34 INFO SparkContext: Starting job: count at SparkMeApp.scala:41
16/09/15 14:36:34 INFO DAGScheduler: Got job 0 (count at SparkMeApp.scala:41) with 2 output partitions
16/09/15 14:36:34 INFO DAGScheduler: Final stage: ResultStage 0 (count at SparkMeApp.scala:41)
16/09/15 14:36:34 INFO DAGScheduler: Parents of final stage: List()
16/09/15 14:36:34 INFO DAGScheduler: Missing parents: List()
16/09/15 14:36:34 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at textFile at SparkMeApp.scala:39), which has no missing parents
16/09/15 14:36:34 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.9 KB, free 230.7 KB)
16/09/15 14:36:34 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1752.0 B, free 232.4 KB)
16/09/15 14:36:34 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.56.1:59781 (size: 1752.0 B, free: 511.1 MB)
16/09/15 14:36:34 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/09/15 14:36:34 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at SparkMeApp.scala:39)
16/09/15 14:36:34 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/09/15 14:36:37 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (BoernsPC:59783) with ID 0
16/09/15 14:36:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, BoernsPC, partition 0,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, BoernsPC, partition 1,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:47 INFO BlockManagerMasterEndpoint: Registering block manager BoernsPC:48448 with 511.5 MB RAM, BlockManagerId(0, BoernsPC, 48448)
16/09/15 14:36:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on BoernsPC:48448 (size: 1752.0 B, free: 511.5 MB)
16/09/15 14:36:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on BoernsPC:48448 (size: 19.3 KB, free: 511.5 MB)
16/09/15 14:36:49 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, BoernsPC): java.io.FileNotFoundException: File file:/E:/myProject/foo.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/09/15 14:36:49 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 1]
16/09/15 14:36:49 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, BoernsPC, partition 0,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:49 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, BoernsPC, partition 1,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:49 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 2]
16/09/15 14:36:49 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, BoernsPC, partition 1,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:49 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 3]
16/09/15 14:36:49 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 5, BoernsPC, partition 0,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:49 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 4]
16/09/15 14:36:49 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, BoernsPC, partition 0,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:49 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 5]
16/09/15 14:36:49 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, BoernsPC, partition 1,PROCESS_LOCAL, 2286 bytes)
16/09/15 14:36:49 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 6]
16/09/15 14:36:49 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
16/09/15 14:36:49 INFO TaskSchedulerImpl: Cancelling stage 0
16/09/15 14:36:49 INFO TaskSchedulerImpl: Stage 0 was cancelled
16/09/15 14:36:49 INFO DAGScheduler: ResultStage 0 (count at SparkMeApp.scala:41) failed in 14,616 s
16/09/15 14:36:49 INFO DAGScheduler: Job 0 failed: count at SparkMeApp.scala:41, took 14,694943 s
16/09/15 14:36:49 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor BoernsPC: java.io.FileNotFoundException (File file:/E:/myProject/foo.txt does not exist) [duplicate 7]
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, BoernsPC): java.io.FileNotFoundException: File file:/E:/myProject/foo.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)
at boern.spark.SparkMeApp$.main(SparkMeApp.scala:41)
at boern.spark.SparkMeApp.main(SparkMeApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File file:/E:/myProject/foo.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/09/15 14:36:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/09/15 14:36:49 INFO SparkContext: Invoking stop() from shutdown hook
16/09/15 14:36:49 INFO SparkUI: Stopped Spark web UI at http://192.168.56.1:4041
16/09/15 14:36:49 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/09/15 14:36:49 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/09/15 14:36:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/09/15 14:36:49 INFO MemoryStore: MemoryStore cleared
16/09/15 14:36:49 INFO BlockManager: BlockManager stopped
16/09/15 14:36:49 INFO BlockManagerMaster: BlockManagerMaster stopped
16/09/15 14:36:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/09/15 14:36:49 INFO SparkContext: Successfully stopped SparkContext
16/09/15 14:36:49 INFO ShutdownHookManager: Shutdown hook called
16/09/15 14:36:49 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/09/15 14:36:49 INFO ShutdownHookManager: Deleting directory C:\Users\Boern\AppData\Local\Temp\spark-2736b20a-fc90-40e8-a7ad-2d8cac8001f2
16/09/15 14:36:49 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/09/15 14:36:49 INFO ShutdownHookManager: Deleting directory C:\Users\Boern\AppData\Local\Temp\spark-2736b20a-fc90-40e8-a7ad-2d8cac8001f2\httpd-14abb177-9801-403c-9df9-84afb2e87d70
1、version
spark:2.0.0
scala:2.11.8
java:1.8.0_91
hadoop:2.7.2
2、question:
When I submit scala program to spark on yarn,it throw a exception:
Caused by: java.lang.IllegalStateException: Library directory '/opt/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1471514504287_0021/container_1471514504287_0021_01_000002/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
3、command
spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.mllib.learning.recommend.CollaborativeFilteringSpark collaborativeFilteringSpark.jar
4、all logs:
16/08/19 11:07:35 INFO SparkContext: Running Spark version 2.0.0
16/08/19 11:07:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/19 11:07:36 INFO SecurityManager: Changing view acls to: hadoop
16/08/19 11:07:36 INFO SecurityManager: Changing modify acls to: hadoop
16/08/19 11:07:36 INFO SecurityManager: Changing view acls groups to:
16/08/19 11:07:36 INFO SecurityManager: Changing modify acls groups to:
16/08/19 11:07:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
16/08/19 11:07:36 INFO Utils: Successfully started service 'sparkDriver' on port 43981.
16/08/19 11:07:36 INFO SparkEnv: Registering MapOutputTracker
16/08/19 11:07:36 INFO SparkEnv: Registering BlockManagerMaster
16/08/19 11:07:36 INFO DiskBlockManager: Created local directory at /opt/spark/blockmgr-57cf9a28-536c-4f03-83cc-c6a59cdeb825
16/08/19 11:07:36 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
16/08/19 11:07:36 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/19 11:07:37 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/19 11:07:37 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.137.101:4040
16/08/19 11:07:37 INFO SparkContext: Added JAR file:/home/hadoop/spark_program/scala/collaborativeFilteringSpark.jar at spark://192.168.137.101:43981/jars/collaborativeFilteringSpark.jar with timestamp 1471576057423
16/08/19 11:07:38 INFO RMProxy: Connecting to ResourceManager at dev-01/192.168.137.101:8032
16/08/19 11:07:38 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/08/19 11:07:38 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/08/19 11:07:38 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/19 11:07:38 INFO Client: Setting up container launch context for our AM
16/08/19 11:07:38 INFO Client: Setting up the launch environment for our AM container
16/08/19 11:07:38 INFO Client: Preparing resources for our AM container
16/08/19 11:07:39 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/19 11:07:40 INFO Client: Uploading resource file:/opt/spark/spark-e7da4489-d07e-4c42-aa50-be789ad1943e/__spark_libs__7265506257548877328.zip -> hdfs://dev-01:9000/user/hadoop/.sparkStaging/application_1471514504287_0021/__spark_libs__7265506257548877328.zip
16/08/19 11:07:44 INFO Client: Uploading resource file:/opt/spark/spark-e7da4489-d07e-4c42-aa50-be789ad1943e/__spark_conf__3473502575984181564.zip -> hdfs://dev-01:9000/user/hadoop/.sparkStaging/application_1471514504287_0021/__spark_conf__.zip
16/08/19 11:07:44 INFO SecurityManager: Changing view acls to: hadoop
16/08/19 11:07:44 INFO SecurityManager: Changing modify acls to: hadoop
16/08/19 11:07:44 INFO SecurityManager: Changing view acls groups to:
16/08/19 11:07:44 INFO SecurityManager: Changing modify acls groups to:
16/08/19 11:07:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
16/08/19 11:07:44 INFO Client: Submitting application application_1471514504287_0021 to ResourceManager
16/08/19 11:07:44 INFO YarnClientImpl: Submitted application application_1471514504287_0021
16/08/19 11:07:44 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1471514504287_0021 and attemptId None
16/08/19 11:07:45 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:45 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1471576064764
final status: UNDEFINED
tracking URL: http://dev-01:8088/proxy/application_1471514504287_0021/
user: hadoop
16/08/19 11:07:46 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:47 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:48 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:49 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:50 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:51 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:52 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:53 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:54 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:55 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/08/19 11:07:55 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> dev-01, PROXY_URI_BASES -> http://dev-01:8088/proxy/application_1471514504287_0021), /proxy/application_1471514504287_0021
16/08/19 11:07:55 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/08/19 11:07:55 INFO Client: Application report for application_1471514504287_0021 (state: ACCEPTED)
16/08/19 11:07:56 INFO Client: Application report for application_1471514504287_0021 (state: RUNNING)
16/08/19 11:07:56 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.137.102
ApplicationMaster RPC port: 0
queue: default
start time: 1471576064764
final status: UNDEFINED
tracking URL: http://dev-01:8088/proxy/application_1471514504287_0021/
user: hadoop
16/08/19 11:07:56 INFO YarnClientSchedulerBackend: Application application_1471514504287_0021 has started running.
16/08/19 11:07:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46171.
16/08/19 11:07:56 INFO NettyBlockTransferService: Server created on 192.168.137.101:46171
16/08/19 11:07:56 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.137.101, 46171)
16/08/19 11:07:56 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.137.101:46171 with 413.9 MB RAM, BlockManagerId(driver, 192.168.137.101, 46171)
16/08/19 11:07:56 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.137.101, 46171)
16/08/19 11:08:03 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (192.168.137.102:42406) with ID 1
16/08/19 11:08:03 INFO BlockManagerMasterEndpoint: Registering block manager dev-02:35791 with 413.9 MB RAM, BlockManagerId(1, dev-02, 35791)
16/08/19 11:08:05 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (192.168.137.102:42410) with ID 2
16/08/19 11:08:05 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/08/19 11:08:05 INFO BlockManagerMasterEndpoint: Registering block manager dev-02:37169 with 413.9 MB RAM, BlockManagerId(2, dev-02, 37169)
16/08/19 11:08:06 INFO SparkContext: Starting job: foreach at CollaborativeFilteringSpark.scala:62
16/08/19 11:08:06 INFO DAGScheduler: Got job 0 (foreach at CollaborativeFilteringSpark.scala:62) with 2 output partitions
16/08/19 11:08:06 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at CollaborativeFilteringSpark.scala:62)
16/08/19 11:08:06 INFO DAGScheduler: Parents of final stage: List()
16/08/19 11:08:06 INFO DAGScheduler: Missing parents: List()
16/08/19 11:08:06 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at CollaborativeFilteringSpark.scala:18), which has no missing parents
16/08/19 11:08:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1432.0 B, free 413.9 MB)
16/08/19 11:08:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1035.0 B, free 413.9 MB)
16/08/19 11:08:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.137.101:46171 (size: 1035.0 B, free: 413.9 MB)
16/08/19 11:08:06 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/19 11:08:06 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at CollaborativeFilteringSpark.scala:18)
16/08/19 11:08:06 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
16/08/19 11:08:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, dev-02, partition 0, PROCESS_LOCAL, 5417 bytes)
16/08/19 11:08:06 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, dev-02, partition 1, PROCESS_LOCAL, 5423 bytes)
16/08/19 11:08:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 2 hostname: dev-02.
16/08/19 11:08:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: dev-02.
16/08/19 11:08:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on dev-02:37169 (size: 1035.0 B, free: 413.9 MB)
16/08/19 11:08:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on dev-02:35791 (size: 1035.0 B, free: 413.9 MB)
16/08/19 11:08:13 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, dev-02): java.lang.ExceptionInInitializerError
at org.apache.spark.mllib.learning.recommend.CollaborativeFilteringSpark$$anonfun$main$1.apply(CollaborativeFilteringSpark.scala:64)
at org.apache.spark.mllib.learning.recommend.CollaborativeFilteringSpark$$anonfun$main$1.apply(CollaborativeFilteringSpark.scala:62)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:875)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Library directory '/opt/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1471514504287_0021/container_1471514504287_0021_01_000002/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:368)
at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:500)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.mllib.learning.recommend.CollaborativeFilteringSpark$.<init>(CollaborativeFilteringSpark.scala:16)
at org.apache.spark.mllib.learning.recommend.CollaborativeFilteringSpark$.<clinit>(CollaborativeFilteringSpark.scala)
... 14 more
16/08/19 11:08:13 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 2, dev-02, partition 1, PROCESS_LOCAL, 5423 bytes)
16/08/19 11:08:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 2 on executor id: 1 hostname: dev-02.
16/08/19 11:08:13 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor dev-02: java.lang.ExceptionInInitializerError (null) [duplicate 1]
16/08/19 11:08:13 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, dev-02, partition 0, PROCESS_LOCAL, 5417 bytes)
16/08/19 11:08:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 3 on executor id: 2 hostname: dev-02.
16/08/19 11:08:14 WARN TransportChannelHandler: Exception in connection from /192.168.137.102:42406
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
16/08/19 11:08:14 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 1.
16/08/19 11:08:14 INFO DAGScheduler: Executor lost: 1 (epoch 0)
16/08/19 11:08:14 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/08/19 11:08:14 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, dev-02, 35791)
16/08/19 11:08:14 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
16/08/19 11:08:14 WARN TransportChannelHandler: Exception in connection from /192.168.137.102:42410
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
16/08/19 11:08:14 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 2.
16/08/19 11:08:14 INFO DAGScheduler: Executor lost: 2 (epoch 1)
16/08/19 11:08:14 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
16/08/19 11:08:14 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, dev-02, 37169)
16/08/19 11:08:14 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
16/08/19 11:08:14 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1471514504287_0021_01_000002 on host: dev-02. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1471514504287_0021_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 50
16/08/19 11:08:14 ERROR YarnScheduler: Lost executor 1 on dev-02: Container marked as failed: container_1471514504287_0021_01_000002 on host: dev-02. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1471514504287_0021_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 50
Make sure that SPARK_HOME environment variable is extracted properly at your cluster. Such error happen when spark-shell try to find spark libraries but because SPARK_HOME is not set it can't find libraries.