Explanation of SBT build file - scala

Question
Is .sbt file is a in scala or in sbt proprietary language? Please help to decipher the sbt build definition.
lazy val root = <--- Instance of the SBT Project object? Why "lazy"? Is "root" the reserved keyword for sbt to identify the project in the build.sbt?
(project in file(".")) <--- Is this defining a Project object regarding the current directory having the SBT expected project structure?
.settings( <--- Is this a scala function call of def settings in the Project object?
name := "NQueen",
version := "1.0",
scalaVersion := "2.11.8",
mainClass in Compile := Some("NQueen")
)
libraryDependencies ++= Seq( <--- libraryDependencies is a reserved keyword of type scala.collection.Seq using which sbt downloads and packages them as part of the output jar?
"org.apache.spark" %% "spark-core" % "2.3.0", <--- Concatenating and creating the full library name including version. I suppose I need to look into respective documentations to find out what to specify.
"org.apache.spark" %% "spark-mllib" % "2.3.0"
)
// <--- Please explain what this block does and where I can find the explanations.
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.firs
}
Resources
Please suggest good resources to understand the design, mechanism, how .sbt works. I looked into the SBT getting started and documents but as Scala definition itself, it is difficult to understand. If it is make, ant, or maven, how things get pieced together and the design/mechanism are so much clear, but need to find good documentations or tutorials for SBT.
References
I looked into the references below trying to understand.
SBT: How to get started using the Build.scala file (instead of build.sbt)
What is the difference between build.sbt and build.scala?
SBT - Build definition
SBT Project object
scala.collection.Seq
SBT Library dependencies
Spark 2.3 Quick Start

sbt can be really difficult for first time users, and it's ok not to fully understand all of the definitions. It will become clearer over time.
let me first simplify you build.sbt. it contains some unnecessary parts, and will be easier to explain without them
name := "NQueen"
version := "1.0"
scalaVersion := "2.11.8"
mainClass in Compile := Some("NQueen")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.0",
"org.apache.spark" %% "spark-mllib" % "2.3.0"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.firs
}
and for your questions:
Is .sbt file is a in scala or in sbt proprietary language?
well, it's both. you can do most scala operations in an .sbt file. you can import and use external dependencies, write custom code, etc.. but some things you can't do (define classes for example).
It's also might look as a dedicated different language, but in reality, it's just a DSL written in scala (:=, in, %%, % are all function written in scala)
libraryDependencies is a reserved keyword of type scala.collection.Seq using which sbt downloads and packages them as part of the output jar?
libraryDependencies is not a reserved keyword. you can think of it as a way to configure you project.
writing libraryDependencies := Seq(..) you basically setting the value of libraryDependencies.
But you are right about the meaning. it is a list of dependencies that should be downloaded.
Concatenating and creating the full library name including version. I suppose I need to look into respective documentations to find out what to specify.
keep in mind that %% and % are functions. you use those functions to specify what modules should be downloaded and added to the classpath.
you can find many dependencies (and their versions) in mvnrepository.
for example, for spark: https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.3.0
Please explain what this block does and where I can find the explanations.
assemblyMergeStrategy is a setting coming from the sbt-assembly plugin.
that plugin allows you to pack your application into a single jar with all the dependencies.
you can read about the merge strategy here: https://github.com/sbt/sbt-assembly#merge-strategy

Related

Spark-Scala build.sbt libraryDependencies UnresolvedDependency

i'm trying to import a dependency in my build.sbt file from here
https://github.com/dmarcous/spark-betweenness.
When i hover on the error it says:
Expression type ModuleID must confirm to Def.SettingsDefinition in SBT file
Unresolved Dependency
I am new in scala so my question may be silly.Thanks in advance
It is still unclear how your build configuration looks like, but the following build.sbt works (in the sense that it compiles and does not show the error that you mentioned):
name := "test-sbt"
organization := "whatever"
version := "1.0.0"
scalaVersion := "2.10.7"
libraryDependencies += "com.centrality" %% "spark-betweenness" % "1.0.0"
Alternatively, if you have a multi-project build, it could look like this:
lazy val root = project
.settings(
name := "test-sbt",
organization := "whatever",
version := "1.0.0",
scalaVersion := "2.10.7",
libraryDependencies += "com.centrality" %% "spark-betweenness" % "1.0.0"
)
However, you're probably going to find that it still does not work because it cannot resolve this dependency. Indeed, this library does not seem to be available neither in Maven Central nor in jcenter. It is also very old - it appears to only be published for Scala 2.10 and a very old Spark version (1.5), so most likely you won't be able to use it with recent Spark environments (2.x and Scala 2.11).

Spark build.sbt file versioning

I am having a hard time understanding the multiple version numbers going into the build.sbt file for spark programs.
1. version
2. scalaVersion
3. spark version?
4. revision number.
There are multiple compatibility between these versions as well.
Can you please explain how to decide these versions for my project.
I hope the following SBT lines and their comments will be sufficient to explain your question.
// The version of your project itself.
// You can change this value whenever you want,
// e.g. everytime you make a production release.
version := "0.1.0"
// The Scala version your project uses for compile.
// If you use spark, you can only use a 2.11.x version.
// Also, because Spark includes its own Scala in runtime
// I recommend you use the same one;
//you can check which one your Spark instance uses in the spark-shell.
scalaVersion := "2.11.12"
// The spark version the project uses for compile.
// Because you wont generate an uber jar with Spark included,
// but deploy your jar to an spark cluster instance.
// This version must match with the remote one, unless you want weird bugs...
val SparkVersion = "2.3.1"
// Note, I use a val with the Spark version
// to make it easier to include several Spark modules in my project,
// this way, if I want/have to change the Spark version,
// I only have to modify one line,
// and avoid strange erros because I changed some versions, but not others.
// Also note the 'Provided' modifier at the end,
// it indicates SBT that it shouldn't include the Spark bits in the generated jar
// neither in package nor assembly tasks.
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % SparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % SparkVersion % Provided,
)
// Exclude Scala from the assembly jar, because spark already includes it.
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
You should also take care of the SBT version, that is the version of the SBT used in your project. You set it in the "project/build.properties" file.
sbt.version=1.2.3
Note:
I use the sbt-assembly plugin, to generate a jar with all dependencies included except Spark and Scala. This is usefull if you use other libraries like the MongoSparkConnector for example.

Scala error when running test in new application. can't expand macros?

I am trying to learn the basics of Scala, scalatest, and sbt and I'm following a tutorial. This is my built.sbt file:
name := "demo-hello"
version := "0.1"
scalaVersion := "2.12.6"
libraryDependencies += "org.scalatest" % "scalatest_2.10" % "2.1.0" % "test"
I have a test that looks like this (showing this is probably unnecessary:
package demo
import org.scalatest.FunSuite
class HelloTest extends FunSuite {
test("say hello method works correctly") {
val hello = new Hello
assert(hello.sayHello("Scala") == "Hello, Scala!")
}
}
What should I do from here? I am trying to run the test but I get this error:
Error:(8, 36) can't expand macros compiled by previous versions of Scala
assert(hello.sayHello("Scala") == "Hello, Scala!")
I'm not that familiar with the % symbol btw.
FIX
I changed my build.sbt to this:
name := "demo-hello"
version := "0.1"
scalaVersion := "2.10"
libraryDependencies += "org.scalatest" % "scalatest_2.10" % "2.1.0" % "test"
Remaining questions:
So it seems downgrading to scalaVersion "2.10" worked. Why?
What is an artifact? scalatest is apparently an artifact?
Where is scalaversion 2.10 kept on my machine? It seems I only have scala 2.12. Where in my project folder is version 2.10?
Answering your questions slightly out-of-order:
2 - An "artifact" is something that's built by maven, sbt, or another build system. For Scala or Java, this is almost always a jar file. Each item in libraryDependencies specifies a file in a maven repository (a database of artifacts).
1 - Scala class files are not compatible between minor versions of Scala. When you download a Scala jar from a maven repository, the Scala version is specified as part of the artifact name. The _2.10 in your dependency declares that you wish to use the version of scalatest that's compile for Scala 2.10 - which is why you were getting an error using it in your Scala 2.12 application.
When declaring dependencies on Scala artifacts in sbt, you should always use the %% operator, which automatically appends the appropriate suffix to your artifact, like so:
// This works for any scalaVersion setting.
libraryDependencies += "org.scalatest" %% "scalatest" % "2.1.0" % "test"
3 - sbt handles downloading the appropriate runtime files for the declared version of Scala automatically.

How do I call a dependent library function from an sbt task?

I have a CLI tool written in Java which can modify some source with the added params. For example, it can rename an enum value across a whole project.
I want to write an sbt task that can run this tool from my project dir with the given params, like sbt 'enums -rename A B'. My tool can be injected to the project through the sbt dependencies.
I skimmed through the book sbt in Action looking for an answer, but those examples are not this specific.
My build.sbt (far from working):
name := """toolTestWithActivator"""
version := "1.0-SNAPSHOT"
resolvers += "Local Repository" at "file://C:/Users/torcsi/.ivy2/local"
lazy val root = (project in file(".")).enablePlugins(PlayJava)
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"tool" % "tool_2.11" % "1.0",
javaJdbc,
javaEbean,
cache,
javaWs
)
val mytool = taskKey[String]("mytool")
mytool := {
com.my.tool.Main
}
Can sbt handle this type of task/dependency structure, or do I need to do this another way?
SBT is recursive: it compiles .sbt files and .scala files under the project folder and use those to execute your build (in fact you can see sbt as a library that helps you producing builds).
So, as you need your library to define a task, that one is a dependency of your build.sbt file (and not a dependency of your project).
To declare that the build.sbt file depends on your library, just create a ".sbt" file in the project folder; example:
project/dependencies.sbt
libraryDependencies += "tool" %% "tool" % "1.0"
and in build.sbt add:
val mytool = taskKey[Unit]("mytool")
mytool := {
com.my.tool.main(Array())
}
Some comments:
be careful with the scala version used: as sbt 0.13 is compiled with scala 2.10; your library should also be compiled for scala 2.10 (the package should be tools_2.10 ). And the new sbt 1.0 is compiled with scala 2.12.
I used the %% notation, so that sbt adds by itself the expected scala version.
I supposed your cli tool defines a classic java main method (or the scala equivalent). So, the argument should be an Array of String (here an empty one) and it returns Unit (void in java).
Some reference to understand the solution:
http://www.scala-sbt.org/0.13/docs/Organizing-Build.html

How to change SBT's rules on generating URLs for Maven repositories?

By default, the Scala Built Tool (SBT) has a set of rules on how to generate URLs when looking up dependencies. For example, if I have the following build file,
// Project settings
name := "MyProject"
version := "0.1"
organization := "com.me"
scalaVersion := "2.8.1"
// Dependencies
libraryDependencies ++= Seq(
"com.google.guava" %% "guava" % "r09"
)
// Repositories
resolvers += "Maven Central Server" at "http://repo1.maven.org/maven2"
Then SBT attempts to find guava at the following URL,
http://repo1.maven.org/maven2/com/google/guava/guava_2.8.1/r09/guava_2.8.1-r09.pom
However, the library I'm looking for in this case isn't even made for Scala, so combining the Scala version just doesn't make sense here. How can I tell SBT what the format is for generating URLs for use with Maven repositories?
EDIT
While it seems that it is possible to edit the layout like so,
Resolver.url("Primary Maven Repository",
new URL("http://repo1.maven.org/maven2/"))( Patterns("[organization]/[module]/[module]-[revision].[ext]") )
the "[module]" keyword is predefined to be the (artifact id)_(scala version) and the "[artifact]" keyword is just "ivy", leaving me back at square one.
As far as I remember "%%" appends the scala version and "%" does not. Try
libraryDependencies ++= Seq(
"com.google.guava" % "guava" % "r09"
)
Check last one paragraph (Custom Layout) of official sbt wiki here.
Basically SBT allows you to use this syntax:
resolvers += Resolver.url("my-test-repo", url)( Patterns("[organisation]/[module]/[revision]/[artifact].[ext]") )