Im working on Scala/Spark project,i would like to export my project to jar file and run it into spark via spark-submit.
I tried this solution :
File -> Project Structure -> Artifacts -> + -> Jar -> From modules with dependencies -> Selected Main Class after browsing -> selected extract to the target jar -> Directory for META-INF automatically gets populated -> OK -> Apply -> OK -> Build -> Build Artifacts -> Build.
But i didn't find my main class on the jar file so i can't run it.
The basic Idea that you can follow:
As you are working on Scala
You can use sbt as your build management system to add all the dependencies to your project
You can use sbt assembly pluggin to build fat jar
Export this fat jar into your cluster to submit the spark jobs.
pls use Google to get more details...
or you can use this project https://github.com/khodeprasad/spark-scala-examples to start with and integrate sbt assembly plugin to create fat jars by following their documentation https://github.com/sbt/sbt-assembly
Related
enter image description here
Getting this error while running spark with scala can anyone suggest how do i solve this issue.
You have not created this as a maven project. What you need to do is delete this project from eclipse. then :
File -> Import -> Maven -> Existing Maven Project -> Select your folder.
This would load your project and include maven dependencies and you would not face errors.
I am trying to use an LSH implementation of Scala(https://github.com/marufaytekin/lsh-spark) in my Spark project.I cloned the repository with some changes to the sbt file (added Organisation)
To use this implementation , I compiled it using sbt compile and moved the jar file to the "lib" folder of my project and updated the sbt configuration file of my project , which looks like this ,
Now when I try to compile my project using sbt compile , It fails to load the external jar file ,showing the error message "unresolved dependency: com.lendap.spark.lsh.LSH#lsh-scala_2.10;0.0.1-SNAPSHOT: not found".
Am i following the right steps for adding an external jar file ?
How do i solve the dependency issue
As an alternative, you can build the lsh-spark project and add the jar in your spark application.
To add the external jars, addJar option can be used while executing spark application. Refer Running spark application on yarn
This issue isn't related to spark but to sbt configuration.
Make sure you followed the correct folder structure imposed by sbt and added your jar in the lib folder, as explained here - lib folder should be at the same level as build.sbt (cf. this post).
You might also want to check out this SO post.
I want to modify an algorithm from the Spark Source Code. In Eclipse Luna, I tried to import the source codes by File -> Import -> General -> Existing Projects into Workspace but after that, the src folder does not have any file. So how should I go about it?
Spark project consist of multiple modules
pom.xml
<modules>
<module>common/sketch</module>
<module>common/network-common</module>
<module>common/network-shuffle</module>
<module>common/unsafe</module>
<module>common/tags</module>
<module>core</module>
<module>graphx</module>
<module>mllib</module>
<module>tools</module>
<module>streaming</module>
<module>sql/catalyst</module>
<module>sql/core</module>
<module>sql/hive</module>
<module>external/docker-integration-tests</module>
<module>assembly</module>
<module>examples</module>
<module>repl</module>
<module>launcher</module>
<module>external/kafka</module>
<module>external/kafka-assembly</module>
</modules>
If you want to import complete Spark project try this:
File -> Import -> Select -> Maven -> Existing Maven Projects -> (Select the root directory of Spark project)
Note: Make sure you have eclipse-maven-plugin already installed.
cd to the home directory of your spark source code.
run :
mvn eclipse:eclipse
this will help you to convert your spark project to a eclipse-maven project.
then:
File -> Import -> Select -> Maven -> Existing Maven Projects -> (Select the root directory of Spark project)
i have build the project using sbt and imported into eclipse . while trying to execute from run configuration i cannot find my main class and on running the application its prompted with the following error : could not find the main class.i have installed spark-hadoop version 1.4 and scala version 2.10.6 in the local machine also changed the scala compiler version to 2.10.6 in scala ide . The same error is produced while trying build spark from eclipse using maven . Please advice..
Check if Your main class in on build path.
Project -> Properties -> Java Build Path -> Source tab
If not add appropriate folder(s).
Add the folder where classes are generated to build path -> configure build path -> java build path -> source -> "Default Output Folder" text box.
Check if you have .scala files included in compilation path.
Even if you have the source folder included, it may be possible that .scala files are not being picked up.
The following might help, I was facing the same issue and it solved the problem for me:
build path -> configure build path -> java build path -> source -> add **/*.scala in the Included section.
I just downloaded groovy plugin for eclipse4.2 from http://dist.springsource.org/release/GRECLIPSE/e4.2/ .I don't have any other installation/ library for groovy on my system. I am able to run groovy programs on my machine in eclipse.
However when I try to import org.junit.Test, I get following error.
Groovy:class org.junit.Test is not an annotation in #org.junit.Test
Groovy:unable to resolve class org.junit.Test
Can anyone tell me what might be the issue?
You must add the JUnit jars to your classpath. Select a Project -> Build Path -> Configure build path... -> Librarires -> Add library -> JUnit -> Next -> JUnit 4 -> OK.