Cannot load main class from JAR file - scala

I have a Spark-scala application. I tried to display a simple message - "Hello my App". When I compile it with sbt compile and run it by sbt run it's fine. I displayed my message with success but he display an error; like this:
Hello my application!
16/11/27 15:17:11 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.InterruptedException
ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:67)
16/11/27 15:17:11 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
[success] Total time: 13 s, completed Nov 27, 2016 3:17:12 PM
16/11/27 15:17:12 INFO DiskBlockManager: Shutdown hook called
I can't understand whether it's fine or not!
Also when I try to load my file jar after the run, it displays an error.
My command line look like:
spark-submit "appfilms" --master local[4] target/scala-2.11/system-of-recommandation_2.11-1.0.jar
And the error is:
Error: Cannot load main class from JAR file:/root/projectFilms/appfilms
Run with --help for usage help or --verbose for debug output
16/11/27 15:24:11 INFO Utils: Shutdown hook called
Please can you answer me!

The error is due to the fact that the SparkContext is not stopped, this is required in versions higher than Spark 2.x.
This should be stopped to prevent this error by SparkContext.stop(), or sc.stop(). Inspiration for solving this error is gained from own experiences and the following sources: Spark Context, Spark Listener Bus error

You forgot to use --class Parameter
spark-submit "appfilms" --master local[4] target/scala-2.11/system-of-recommandation_2.11-1.0.jar
spark-submit --class "appfilms" --master local[4] target/scala-2.11/system-of-recommandation_2.11-1.0.jar.
Please note if appfilm belong to any package dont forgot to add package name as below
packagename.appfilms
I believe this will suffice

Related

Container killed by YARN for exceeding memory limits in Spark Scala

I am Facing below Error while Running my Spark Scala code using Spark-submit command.
ERROR cluster.YarnClusterScheduler: Lost executor 14 on XXXX: Container killed by YARN for exceeding memory limits. 55.6 GB of 55 GB physical memory used.
The Code of the Line Number it throws the error is below...
df.write.mode("overwrite").parquet("file")
I am Writing a Parquet file.... It was working till yesterday not sure from last run only it is throwing the error with same input file.
Thanks,
Naveen
By Running with below conf in spark-submit command, the issues is resolved and code ran successfully.
--conf spark.dynamicAllocation.enabled=true
Thanks,
Naveen

Must the Scala-Spark developer install Spark and Hadoop on his computer?

I had installed the Hadoop + Spark cluster on the servers.
It is working fine writing scala code in the spark-shell on the master server.
I put the Spark library (the jar files) in my project and I'm writing my first Scala code on my computer through Intellij.
When I run a simple code that just creates a SparkContext object for reading a file from the HDFS through the hdfs protocol, it outputs error messages.
The test function:
import org.apache.spark.SparkContext
class SpcDemoProgram {
def demoPrint(): Unit ={
println("class spe demoPrint")
test()
}
def test(){
var spark = new SparkContext();
}
}
The messages is:
20/11/02 12:36:26 INFO SparkContext: Running Spark version 3.0.0
20/11/02 12:36:26 WARN Shell: Did not find winutils.exe: {}
java.io.FileNotFoundException: java.io.FileNotFoundException:
HADOOP_HOME and hadoop.home.dir are unset. -see
https://wiki.apache.org/hadoop/WindowsProblems at
org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548) at
org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569) at
org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592) at
org.apache.hadoop.util.Shell.(Shell.java:689) at
org.apache.hadoop.util.StringUtils.(StringUtils.java:78) at
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1664)
at
org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:104)
at
org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
at
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
at
org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
at
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
at
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
at
org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
at scala.Option.getOrElse(Option.scala:189) at
org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412) at
org.apache.spark.SparkContext.(SparkContext.scala:303) at
org.apache.spark.SparkContext.(SparkContext.scala:120) at
scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at
scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at
scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at
scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala) Caused by:
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are
unset. at
org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468) at
org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439) at
org.apache.hadoop.util.Shell.(Shell.java:516) ... 19 more
20/11/02 12:36:26 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable 20/11/02 12:36:27 ERROR SparkContext: Error initializing
SparkContext. org.apache.spark.SparkException: A master URL must be
set in your configuration at
org.apache.spark.SparkContext.(SparkContext.scala:380) at
org.apache.spark.SparkContext.(SparkContext.scala:120) at
scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at
scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at
scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at
scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala) 20/11/02
12:36:27 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master
URL must be set in your configuration at
org.apache.spark.SparkContext.(SparkContext.scala:380) at
org.apache.spark.SparkContext.(SparkContext.scala:120) at
scala.spc.demo.SpcDemoProgram.test(SpcDemoProgram.scala:14) at
scala.spc.demo.SpcDemoProgram.demoPrint(SpcDemoProgram.scala:9) at
scala.spc.demo.SpcDemoProgram$.main(SpcDemoProgram.scala:50) at
scala.spc.demo.SpcDemoProgram.main(SpcDemoProgram.scala)
Does that error message imply that Hadoop and Spark must be installed on my computer?
What configuration do I need to do?
I assume, you are trying to read the file with the path as hdfs://<FILE_PATH> then yes you need to have Hadoop installed else if its just a local directory you could try without "hdfs://" in the file path.

Failed to submit local jar to spark cluster: java.nio.file.NoSuchFileException

~/spark/spark-2.1.1-bin-hadoop2.7/bin$ ./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/20 16:41:30 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: Submission successfully created as driver-20170620204130-0005. Polling submission state...
17/06/20 16:41:31 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20170620204130-0005 in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: State of driver driver-20170620204130-0005 is now ERROR.
17/06/20 16:41:31 INFO RestSubmissionClient: Driver is running on worker worker-20170620203037-172.17.0.5-45429 at 172.17.0.5:45429.
17/06/20 16:41:31 ERROR RestSubmissionClient: Exception from the cluster:
java.nio.file.NoSuchFileException: /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
java.nio.file.Files.copy(Files.java:1274)
org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:608)
org.apache.spark.util.Utils$.copyFile(Utils.scala:579)
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:664)
org.apache.spark.util.Utils$.fetchFile(Utils.scala:463)
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:154)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:172)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:91)
17/06/20 16:41:31 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20170620204130-0005",
"serverSparkVersion" : "2.1.1",
"submissionId" : "driver-20170620204130-0005",
"success" : true
}
Log from spark-worker:
2017-06-20T20:41:30.807403232Z 17/06/20 20:41:30 INFO Worker: Asked to launch driver driver-20170620204130-0005
2017-06-20T20:41:30.817248508Z 17/06/20 20:41:30 INFO DriverRunner: Copying user jar file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.883645747Z 17/06/20 20:41:30 INFO Utils: Copying /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.885217508Z 17/06/20 20:41:30 INFO DriverRunner: Killing driver process!
2017-06-20T20:41:30.885694618Z 17/06/20 20:41:30 WARN Worker: Driver driver-20170620204130-0005 failed with unrecoverable exception: java.nio.file.NoSuchFileException: home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
Any idea why? Thanks
UPDATE
Is the following command right?
./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
UPDATE
I think I understand a little more about the spark and why I had this problem and spark-submit error: ClassNotFoundException. The key point is that though the word REST used here REST URL: spark://127.0.1.1:6066 (cluster mode), the application jar will not be uploaded to the cluster after submission, which is different with my understanding. so, the spark cluster cannot find the application jar, and cannot load the main class.
I will try to find how to setup the spark cluster and use the cluster mode to submit application. No idea whether client mode will use more resources for streaming jobs.
Blockquote
UPDATE
I think I understand a little more about the spark and why I had this problem and >spark-submit error: ClassNotFoundException. The key point is that though the word >REST used here REST URL: spark://127.0.1.1:6066 (cluster mode), the application >jar will not be uploaded to the cluster after submission, which is different with >my understanding. so, the spark cluster cannot find the application jar, and >cannot load the main class.
That's why you have to locate the jar-file in the master node OR put it into the hdfs before the spark submit.
This is how to do it:
1.) Transfering the file to the master node with ubuntu command
$ scp <file> <username>#<IP address or hostname>:<Destination>
For example:
$ scp mytext.txt tom#128.140.133.124:~/
2.) Transfering the file to the HDFS:
$ hdfs dfs -put mytext.txt
Hope I could help you.
You are submiting the application with cluster mode, this mean a Spark driver application will be created somewhere, the file must exist here.
That why with Spark, its recommanded to use a distributed file system like HDFS or S3.
The standalone mode cluster wants to pass jar files to hdfs because the driver is on any node in the cluster.
hdfs dfs -put xxx.jar /user/
spark-submit --master spark://xxx:7077 \
--deploy-mode cluster \
--supervise \
--driver-memory 512m \
--total-executor-cores 1 \
--executor-memory 512m \
--executor-cores 1 \
--class com.xiyou.bi.streaming.game.common.DmMoGameviewOnlineLogic \
hdfs://xxx:8020/user/hutao/xxx.jar

Running spark job on yarn using remote SparkContext: Yarn application has already ended

I'm trying to launch a program that creates SparkContext on yarn. Here is my simple program:
object Entry extends App {
System.setProperty("SPARK_YARN_MODE", "true")
val sparkConfig = new SparkConf()
.setAppName("test-connection")
.setMaster("yarn-client")
val sparkContext = new SparkContext(sparkConfig)
val numbersRDD = sparkContext.parallelize(List(1, 2, 3, 4, 5))
println {
s"result is ${numbersRDD.reduce(_ + _)}"
}
}
build.sbt
scalaVersion := "2.10.5"
libraryDependencies ++= {
val sparkV = "1.6.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkV,
"org.apache.spark" %% "spark-yarn" % sparkV,
)
}
I'm using google cloud dataproc running this program inside the master node via sbt run
These are logs:
16/03/09 08:38:31 INFO YarnClientImpl: Submitted application application_1457497836188_0013 to ResourceManager at /0.0.0.0:8032
16/03/09 08:38:32 INFO Client: Application report for application_1457497836188_0013 (state: ACCEPTED)
16/03/09 08:38:32 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1457512711191
final status: UNDEFINED
tracking URL: http://recommendation-cluster-m:8088/proxy/application_1457497836188_0013/
user: ibosz
16/03/09 08:38:33 INFO Client: Application report for application_1457497836188_0013 (state: ACCEPTED)
16/03/09 08:38:34 INFO Client: Application report for application_1457497836188_0013 (state: ACCEPTED)
16/03/09 08:38:35 INFO Client: Application report for application_1457497836188_0013 (state: FAILED)
16/03/09 08:38:35 INFO Client:
client token: N/A
diagnostics: Application application_1457497836188_0013 failed 2 times due to AM Container for appattempt_1457497836188_0013_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://recommendation-cluster-m:8088/cluster/app/application_1457497836188_0013Then, click on links to logs of each attempt.
Diagnostics: java.io.FileNotFoundException: File file:/home/ibosz/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.6.0.jar does not exist
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1457512711191
final status: FAILED
tracking URL: http://recommendation-cluster-m:8088/cluster/app/application_1457497836188_0013
user: ibosz
16/03/09 08:38:35 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at Entry$delayedInit$body.apply(Entry.scala:13)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at Entry$.main(Entry.scala:6)
at Entry.main(Entry.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sbt.Run.invokeMain(Run.scala:67)
at sbt.Run.run0(Run.scala:61)
at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Logger$$anon$4.apply(Logger.scala:85)
at sbt.TrapExit$App.run(TrapExit.scala:248)
at java.lang.Thread.run(Thread.java:745)
it says
java.io.FileNotFoundException: File file:/home/ibosz/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.6.0.jar does not exist
but it does exist. And having no problem running spark-shell --master yarn-client. What's wrong with my code?
While there's probably a way to force sbt run to correctly do a real yarn-client mode Spark submission, you probably just want to do this instead:
sbt package
spark-submit target/scala-2.10/*SNAPSHOT.jar
Essentially, the error you're encountering is that when the SparkContext gets created, it asks for a remote YARN container to hold the AppMaster process, which will reside on one of your worker nodes. It's passing through aspects of your master's local environment, which includes your sbt-specific copy of the Spark assembly that was used in the build (under the ~/.ivy2/cache/ directory). The workers' environments won't match the environment in which you're running sbt run, which is why it fails.
Note that the spark-submit command is itself just a bash script whose whole purpose is to run the jarfile with all the right environment-variable and classpath configurations, so anything that gets sbt run to work will essentially duplicate the logic of the spark-submit script, and probably do it in a non-portable way.
The plus side of all this is that using spark-submit foo.jar will make your invocation nice and portable; once you want to productionize your job for example, you can use Dataproc's job-submission APIs on that same jarfile just like you'd use spark-submit: gcloud dataproc jobs submit spark --jar foo.jar <your_job_args>, and you can even submit those Spark jarfiles through Dataproc's web GUI just by uploading your jarfile to GCS first and then specifying the gs:// path to your jarfile for the job.
Similarly, if you have a local spark setup just by untarring a standard Spark binary distro, you can still use spark-submit even if you don't have sbt installed on that local spark setup.

Spark job using HBase fails

Any Spark job I run that involves HBase access results in the errors below. My own jobs are in Scala, but supplied python examples end the same. The cluster is Cloudera, running CDH 5.4.4. The same jobs run fine on a different cluster with CDH 5.3.1.
Any help is greatly apreciated!
...
15/08/15 21:46:30 WARN TableInputFormatBase: initializeTable called multiple times. Overwriting connection and table reference; TableInputFormatBase will not close these old references when done.
...
15/08/15 21:46:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, some.server.name): java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:163)
...
Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:389)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:158)
... 14 more
run spark-shell with this parameters:
--driver-class-path .../cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar --driver-java-options "-Dspark.executor.extraClassPath=.../cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar"
Why it works is described here.