I wrote a program in scala and created an executable JAR using the assembly instruction of sbt Now I have to upload and run it on my platform.
For building jar i have gone through
File -> Project Structure -> Project Settings -> Artifacts -> Click
green plus sign -> Jar -> From modules with dependencies..
I use the command:
spark-submit --class "ReadCSVwithnull" Scala.jar
but I get an error
Exception in thread "main" java.lang.SecurityException: Invalid
signature file d igest for Manifest main attributes at
sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVeri
fier.java:284)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier
.java:238)
mu version are InteliJ -2018.3.1
spark 2.3.2
scala 2.11.8
sbt version: sbt 1.2.7
Deleting the signature files inside the Manifest worked for me.
Use command
zip -d Scala.jar 'META-INF/*.RSA' 'META-INF/*.DSA' 'META-INF/*.SF'
Related
Context:
Scala, Spark application, fat JAR, spark 3.3.0, windows 10
IDE:IntelliJ IDEA 2022.2.2
Package: it..<...>.MyPackage
MainClass : it..<...>.MyPackage.Application
Jar building: using sbt-assembly, or with Build Artifact (JAR)
Problem: in both cases, building jar with sbt or with IntelliJ
spark-submit --verbose --master local --class it.<company>.<...>.MyPackage.Application
C:\<path to jar\MyPackage.jar 10
Error: Failed to load class it..<...>.MyPackage.Application.
22/09/20 23:14:25 INFO ShutdownHookManager: Shutdown hook called
End of the story.
Notes:
Same JAR, moved to an instance of Spark running on MacOS, no problem...
Thanks for any suggestion
Lorenzo
Trying to read a file on S3 bucket from spark and scala program but it is failing at read step which reads the file from S3 with the below error.
Caused by: java.lang.ClassNotFoundException Create breakpoint: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found.
But same code is working with sbt on linux machine with the below steps
project_dir# sbt clean
project_dir# sbt compile
project_dir# sbt run
If jar is created for the same code, that jar is not working when it is executed using spark-submit
project_dir# sbt package
Im working on Scala/Spark project,i would like to export my project to jar file and run it into spark via spark-submit.
I tried this solution :
File -> Project Structure -> Artifacts -> + -> Jar -> From modules with dependencies -> Selected Main Class after browsing -> selected extract to the target jar -> Directory for META-INF automatically gets populated -> OK -> Apply -> OK -> Build -> Build Artifacts -> Build.
But i didn't find my main class on the jar file so i can't run it.
The basic Idea that you can follow:
As you are working on Scala
You can use sbt as your build management system to add all the dependencies to your project
You can use sbt assembly pluggin to build fat jar
Export this fat jar into your cluster to submit the spark jobs.
pls use Google to get more details...
or you can use this project https://github.com/khodeprasad/spark-scala-examples to start with and integrate sbt assembly plugin to create fat jars by following their documentation https://github.com/sbt/sbt-assembly
i have build the project using sbt and imported into eclipse . while trying to execute from run configuration i cannot find my main class and on running the application its prompted with the following error : could not find the main class.i have installed spark-hadoop version 1.4 and scala version 2.10.6 in the local machine also changed the scala compiler version to 2.10.6 in scala ide . The same error is produced while trying build spark from eclipse using maven . Please advice..
Check if Your main class in on build path.
Project -> Properties -> Java Build Path -> Source tab
If not add appropriate folder(s).
Add the folder where classes are generated to build path -> configure build path -> java build path -> source -> "Default Output Folder" text box.
Check if you have .scala files included in compilation path.
Even if you have the source folder included, it may be possible that .scala files are not being picked up.
The following might help, I was facing the same issue and it solved the problem for me:
build path -> configure build path -> java build path -> source -> add **/*.scala in the Included section.
I have a Spark project which I can run from sbt console. However, when I try to run it from the command line, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext. This is expected, because the Spark libs are listed as provided in the build.sbt.
How do I configure things so that I can run the JAR from the command line, without having to use sbt console?
To run Spark stand-alone you need to build a Spark assembly.
Run sbt/sbt assembly on the spark root dir. This will create: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
Then you build your job jar with dependencies (either with sbt assembly or maven-shade-plugin)
You can use the resulting binaries to run your spark job from the command line:
ADD_JARS=job-jar-with-dependencies.jar SPARK_LOCAL_IP=<IP> java -cp spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar:job-jar-with-dependencies.jar com.example.jobs.SparkJob
Note: If you need other HDFS version, you need to follow additional steps before building the assembly. See About Hadoop Versions
Using sbt assembly plugin we can create a single jar. After doing that you can simply run it using java -jar command
For more details refer