Compiling Spark Mllib using sbt - scala

I have been building some modifications inside Spark Mllib for a while now and every time I want to want to compile Spark, I have to do the following:
sbt update
sbt compile
sbt clean
sbt package
While this procedure produces what I want, I find it unnecessary to compile and package the other Spark modules. Is there a quick command to do what I want?

For the compile part, just do sbt complie which is incremental by default.
You might be able to package subsets of the project with something like sbt mllib/package.

Related

Compiling with scalac does not find sbt dependencies

I tried running my Scala code in VSCode editor. I am able to run my script via spark-submit command. But when I am trying with scalac to compile, I am getting:
.\src\main\scala\sample.scala:1: error: object apache is not a member of package org
import org.apache.spark.sql.{SQLContext,SparkSession}
I have already added respective library dependencies to build.sbt.
Have you tried running sbt compile?
Running scalac directly means you're compiling only one file, without the benefits of sbt and especially the dependencies that you have added in your build.sbt file.
In a sbt project, there's no reason to use scalac directly. This defeats the purpose of sbt.

Building Customize Spark

We are creating a customize version of Spark since we are changing some lines of code from ALS.scala. We build the customize spark version using
mvn command:
./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn
However, upon using the customized version of Spark, we run into this error:
Do you guys have some idea on what causes the error and how we might solve the issue?
I am actually using a jar file in the local machine by building them using sbt: sbt compile then sbt clean package and putting the jar file here: /Users/user/local/kernel/kernel-0.1.5-SNAPSHOT/lib.
However in the hadoop environment, the installation is different. Thus, I use maven to build spark and that's where the error comes in. I am thinking that this error might be dependent on using maven to build spark as there are some reports like this:
https://issues.apache.org/jira/browse/SPARK-2075
or probably on building spark assembly files

How to Compile Apache Spark with Scala 2.11.1 using SBT?

I've been trying to compile Apache spark with scala-2.11.1 (the latest version at the time). However, each time I try it ends up compiling everything to scala-2.10.*. I don't understand why.
The official documentation suggests that we use maven for compilation after switching to 2.11 using script in the dev/ folder.
What if I wanted to use sbt instead?
You need to enable scala-2.11 profile
>sbt -Dscala-2.11=true
sbt> compile

Do you need to install Scala separately if you use sbt?

Reason I ask, is because it's possible to specify a Scala version in the build.sbt file (using scalaVersion setting), and once you do that, sbt will automatically download that Scala version to use with the project.
I also seem to remember that despite having Scala 2.11.1 on my system, sbt would compile and run with Scala 2.10 if no version was specified.
So the question is, do I need to install Scala separately if I got sbt installed?
No you don't need it. sbt will download Scala for you.
If you install sbt-extras (basically just a script) you don't even need to download sbt: it will automatically download the sbt launcher you need. Very handy since you just need to specify sbt.version in your build.properties and you're good to go.
Edit: removed my comment about not being able to do sbt console in an empty directory, since both sbt and sbt-extras support it now.

Compile scala files from a sbt plugin

I am developing a sbt plugin. In this plugin I generate some new scala sources packaged in a sbt project. Then I need to compile these new files programaticaly so that I could add the generated class in my classLoader.
I do not find any way to compile programaticaly sources from a given sbt project path (and eventually from a classLoader) in the sbt API, something as simple as the sbt command (sbt compile) line would be very convenient, something like:
XXX.compile(path/to/sbt/project)
Thanks
I suggest you have a look at sbt-boilerplate which is an sbt plugin that generates code, works well and is really simple.
Here's a link to the file that you probably want to take a look at