Spark shell error while running num executors. YARN application has exited unexpectedly with state FAILED - scala

I have just installed spark-3.3.1 and am trying to run the num executors but the job is getting failed.
I am doing it for the first time. I am unable to identify the cause of job failure here.
adminn#master:~$ spark-shell --master yarn --num-executors 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/07 07:13:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your latform... using builtin-java classes where applicable
23/01/07 07:13:40 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = yarn, app id = application_1673055142624_0003).
Spark session available as 'spark'.
23/01/07 07:14:17 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
23/01/07 07:14:17 ERROR YarnClientSchedulerBackend: Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:558)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:277)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:926)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:925)
at java.security.AccessController.doPrivileged(Native Method)
at javax.secrrity.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:925)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:957)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:34213
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:34213
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
23/01/07 07:14:17 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to send shutdown message before the AM has registered!
23/01/07 07:14:17 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_352)
Type in expressions to have them evaluated.
Type :help for more information.
scala>

Related

Spark application erroring out because driver outofmemory even though lot free memory still available

Can someone please help me to figure out my simple spark application is requiring huge driver memory? Even though I allocated about 112GB, my application fails at about 67GB.
Thanks in advance
The spark driver is using huge memory for running simple application
Allocated about 112G of memory for running my application when do spark-submit
At start of the job, I see below message in the logs
1019 [main] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore started with capacity 67.0 GiB
My application fails with this error message
java.lang.IllegalStateException: dag-scheduler-event-loop has already been stopped accidentally.
at org.apache.spark.util.EventLoop.post(EventLoop.scala:107)
at org.apache.spark.scheduler.DAGScheduler.taskStarted(DAGScheduler.scala:283)
at org.apache.spark.scheduler.TaskSetManager.prepareLaunchingTask(TaskSetManager.scala:539)
at org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:478)
at scala.Option.map(Option.scala:230)
at org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:455)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:395)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:390)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:390)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:381)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:587)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:582)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:582)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:555)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:555)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:359)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:955)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:351)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:162)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
59376410 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Cancelling stage 1
59376410 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Killing all running tasks in stage 1: Stage cancelled
59376415 [dispatcher-CoarseGrainedScheduler] ERROR org.apache.spark.scheduler.DAGSchedulerEventProcessLoop - DAGSchedulerEventProcessLoop failed; shutting down SparkContext
scala code snippet
val df = spark.read.parquet(data_path)
df.rdd.foreachPartition(p => {
// code to process the code...
})
Job Submit
'''
spark-submit --master "spark://x.x.x.x:7077" --driver-cores=4 --driver-memory=112G --conf spark.driver.maxResultSize=0 --conf spark.rpc.message.maxSize=2047 --conf spark.driver.host=x.x.x.x --class myclass.processor --packages "..,org.apache.hadoop:hadoop-azure:3.3.1" --deploy-mode client
'''

SocketTimeoutException when trying to run PySpark app from PyCharm

This is my first Python app that I'm trying to run on Spark. I have had no problems before running scala apps in server or standalone.
I start pyspark on another command window like the following:
C:\Users\jesaremi>conda activate py3.6
(py3.6) C:\Users\jesaremi>pyspark --master local[1]
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
2019-01-23 09:05:45 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Python version 3.6.8 (default, Dec 30 2018 18:50:55)
SparkSession available as 'spark'.
And here's my python script which is copied over from somewhere else:
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('MyFirstStandaloneApp')
sc = SparkContext(conf=conf)
text_file = sc.textFile("./shakespeare.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
print ("Number of elements: " + str(counts.count()))
counts.saveAsTextFile("./shakespeareWordCount")
The project interepreter in my PyCharm is set to Python 3.6 which I created my self and it contains important packages such as pyspark and py4j
The result of the run is the following:
C:\Users\jesaremi\AppData\Local\Continuum\anaconda3\envs\py3.6\python.exe D:/Projects/HelloSpark/Main.py
2019-01-23 09:17:27 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-01-23 09:17:28 WARN Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[Stage 0:> (0 + 1) / 1]Traceback (most recent call last):
File "C:\Users\jesaremi\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\jesaremi\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\spark-2.4.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 25, in <module>
ModuleNotFoundError: No module named 'resource'
2019-01-23 09:17:40 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:97)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:103)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
at java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:135)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:199)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:164)
... 18 more
thanks
Apparently PySpark 2.4 is messed up and you need to downgrade to 2.3.2 on Windows.
See this for more details:
No module named 'resource' installing Apache Spark on Windows

spark-submit failed due to Exception in thread "main" java.lang.NoSuchMethodException

I got the error information below when i tried to submit my spark job for testing purpose.
jianrui#spark:~$ sudo $SPARK_HOME/bin/spark-submit --class com.test.spark.FirstScalaExample --master spark://spark.sparkstreaming.i10.internal.cloudapp.net:7077 /opt/spark/FirstScalaExample-0.0.1.jar
Exception in thread "main" java.lang.NoSuchMethodException: com.test.spark.FirstScalaExample.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:42)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-04-06 13:13:00 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-06 13:13:00 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-7f47cab1-f8b3-4731-bd67-e0d0ad013617
[Scala version - 2.11.6]
[Hadoop version - 2.7.5]
[Spark version - 2.3.0]
Note:
To be informed that i specified the hadoop native lib in this way "export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native" in the "spark-env.sh", and i only extract the spark but not installed.
I don't know what exactly the problem is.

Spark in cluster mode throws error if a SparkContext is not started

I have a Spark job that initializes the spark context only if it is really necessary:
val conf = new SparkConf()
val jobs: List[Job] = ??? //get some jobs
if(jobs.nonEmpty) {
val sc = new SparkContext(conf)
sc.parallelize(jobs).foreach(....)
} else {
//do nothing
}
It worked fine on Yarn if deploy-mode is 'client'
spark-submit --master yarn --deploy-mode client
Then I switched deploy mode to 'cluster' and it started to crash in case of jobs.isEmpty
spark-submit --master yarn --deploy-mode cluster
Below is the error text:
INFO yarn.Client: Application report for
application_1509613523426_0017 (state: ACCEPTED)
17/11/02 11:37:17
INFO yarn.Client: Application report for
application_1509613523426_0017 (state: FAILED) 17/11/02 11:37:17
INFO yarn.Client: client token: N/A diagnostics: Application
application_1509613523426_0017 failed 2 times due to AM Container for
appattempt_1509613523426_0017_000002 exited with exitCode: -1000 For
more detailed output, check application tracking
page:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017Then,
click on links to logs of each attempt. Diagnostics: File does not
exist:
hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip
java.io.FileNotFoundException: File does not exist:
hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at
org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application. ApplicationMaster
host: N/A ApplicationMaster RPC port: -1 queue: dev start time:
1509622629354 final status: FAILED tracking URL:
http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017 user: xxx Exception in thread "main"
org.apache.spark.SparkException: Application
application_1509613523426_0017 finished with failed status at
org.apache.spark.deploy.yarn.Client.run(Client.scala:1104) at
org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at
org.apache.spark.deploy.yarn.Client.main(Client.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/11/02 11:37:17 INFO util.ShutdownHookManager: Shutdown hook called
17/11/02 11:37:17 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-a5b20def-0218-4b0c-b9f8-fdf8a1802e95
Is it a bug in Yarn support or I'm missing something?
SparkContext is the one who is responsible for communication with cluster manager. If application is submitted to the cluster, but context is never created, YARN cannot determine the state of the application - this is why you get an error.

Apache Toree and Spark Scala Not Working in Jupyter

I'm having problems running Scala Spark on Jupyter. Below is my error message when I load Apache Toree - Scala notebook in jupyter.
root#ubuntu-2gb-sgp1-01:~# jupyter notebook --ip 0.0.0.0 --port 8888
[I 03:14:54.281 NotebookApp] Serving notebooks from local directory: /root
[I 03:14:54.281 NotebookApp] 0 active kernels
[I 03:14:54.281 NotebookApp] The Jupyter Notebook is running at: http://0.0.0.0:8888/
[I 03:14:54.281 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 03:14:54.282 NotebookApp] No web browser found: could not locate runnable browser.
[I 03:15:09.976 NotebookApp] 302 GET / (61.6.68.44) 1.21ms
[I 03:15:15.924 NotebookApp] Creating new notebook in
[W 03:15:16.592 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20161120031454 (61.6.68.44) 15.49ms referer=http://188.166.235.21:8888/notebooks/Untitled2.ipynb?kernel_name=apache_toree_scala
[I 03:15:16.677 NotebookApp] Kernel started: 94a63354-d294-4de7-a12c-2e05905e0c45
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Kernel version: 0.1.0.dev8-incubating-SNAPSHOT
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Scala version: Some(2.10.4)
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - ZeroMQ (JeroMQ) version: 3.2.2
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Initializing internal actor system
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:185)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:453)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:192)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:231)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:585)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:578)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:109)
at org.apache.toree.boot.layer.StandardBareInitialization$class.createActorSystem(BareInitialization.scala:71)
at org.apache.toree.Main$$anon$1.createActorSystem(Main.scala:35)
at org.apache.toree.boot.layer.StandardBareInitialization$class.initializeBare(BareInitialization.scala:60)
at org.apache.toree.Main$$anon$1.initializeBare(Main.scala:35)
at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:72)
at org.apache.toree.Main$delayedInit$body.apply(Main.scala:40)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at org.apache.toree.Main$.main(Main.scala:24)
at org.apache.toree.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[W 03:15:26.738 NotebookApp] Timeout waiting for kernel_info reply from 94a63354-d294-4de7-a12c-2e05905e0c45
When running Scala shell, this is my output logs
root#ubuntu-2gb-sgp1-01:~# spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/20 03:17:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/20 03:17:12 WARN Utils: Your hostname, ubuntu-2gb-sgp1-01 resolves to a loopback address: 127.0.1.1; using 10.15.0.5 instead (on interface eth0)
16/11/20 03:17:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/11/20 03:17:13 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.15.0.5:4040
Spark context available as 'sc' (master = local[*], app id = local-1479611833426).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
This problem was highlighted before in jira https://issues.apache.org/jira/browse/TOREE-336 . However, I'm still unable to get it working for some reason.
I followed the instructions listed on their official site.
https://toree.apache.org/documentation/user/quick-start
This is my path
scala> root#ubuntu-2gb-sgp1-01:~# echo $PATH
/root/bin:/root/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/spark:/usr/local/spark/bin
Please note I didnt install Scala as it comes with spark.
Thanks
We haven't used Spark 2.0 in production yet with Scala 2.11 and notebooks.
The root cause you your error is in compatibility. Based on GitHub Toree description, the latest Scala version that is supported is Scala 2.10.4 and you have 2.11.8.
Try to downgrade it to 2.10 if it is not a production need to use only 2.11