SBT - Multi project merge strategy and build sbt structure when using assembly - scala

I have a project that consists of multiple smaller projects, some with dependencies upon each other, for example, there is a utility project that depends upon commons project.
Other projects may or may not depend upon utilities or commons or neither of them.
In the build.sbt I have the assembly merge strategy at the end of the file, along with the tests in assembly being {}.
My question is: is this correct, should each project have its own merge strategy and if so, will the others that depend on it inherit this strategy from them? Having the merge strategy contained within all of the project definitions seems clunky and would mean a lot of repeated code.
This question applied to the tests as well, should each project have the line for whether tests should be carried out or not, or will that also be inherited?
Thanks in advance. If anyone knows of a link to a sensible (relatively complex) example that'd also be great.

In my day job I currently work on a large multi-project. Unfortunately its closed source so I can't share specifics, but I can share some guidance.
Create a rootSettings used only by the root/container project, since it usually isn't part of an assembly or publish step. It would contain something like:
lazy val rootSettings := Seq(
publishArtifact := false,
publishArtifact in Test := false
)
Create a commonSettings shared by all the subprojects. Place the base/shared assembly settings here:
lazy val commonSettings := Seq(
// We use a common directory for all of the artifacts
assemblyOutputPath in assembly := baseDirectory.value /
"assembly" / (name.value + "-" + version.value + ".jar"),
// This is really a one-time, global setting if all projects
// use the same folder, but should be here if modified for
// per-project paths.
cleanFiles <+= baseDirectory { base => base / "assembly" },
test in assembly := {},
assemblyMergeStrategy in assembly := {
case "BUILD" => MergeStrategy.discard
case "logback.xml" => MergeStrategy.first
case other: Any => MergeStrategy.defaultMergeStrategy(other)
},
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter { _.data.getName.matches(".*finatra-scalap-compiler-deps.*") }
}
)
Each subproject uses commonSettings, and applies project-specific overrides:
lazy val fubar = project.in(file("fubar-folder-name"))
.settings(commonSettings: _*)
.settings(
// Project-specific settings here.
assemblyMergeStrategy in assembly := {
// The fubar-specific strategy
case "fubar.txt" => MergeStrategy.discard
case other: Any =>
// Apply inherited "common" strategy
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(other)
}
)
.dependsOn(
yourCoreProject,
// ...
)
And BTW, if using IntelliJ. don't name your root project variable root, as this is what appears as the project name in the recent projects menu.
lazy val myProjectRoot = project.in(file("."))
.settings(rootSettings: _*)
.settings(
// ...
)
.dependsOn(
// ...
)
.aggregate(
fubar,
// ...
)
You may also need to add a custom strategy for combining reference.conf files (for the Typesafe Config library):
val reverseConcat: sbtassembly.MergeStrategy = new sbtassembly.MergeStrategy {
val name = "reverseConcat"
def apply(tempDir: File, path: String, files: Seq[File]): Either[String, Seq[(File, String)]] =
MergeStrategy.concat(tempDir, path, files.reverse)
}
assemblyMergeStrategy in assembly := {
case "reference.conf" => reverseConcat
case other => MergeStrategy.defaultMergeStrategy(other)
}

Related

Generate Executable Jar for a Scala Project

I have a scala project which contain multiple main methods. I want to generate a fat jar so that I can run one of the main method related code.
build.sbt
lazy val commonSettings = Seq(
version := "0.1-SNAPSHOT",
organization := "my.company",
scalaVersion := "2.11.12",
test in assembly := {}
)
lazy val app = (project in file("app")).
settings(commonSettings: _*).
settings(
mainClass in assembly := Some("my.test.Category")
)
assemblyMergeStrategy in assembly := {
case "reference.conf" => MergeStrategy.concat
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs#_*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
plugins.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
By this Manifest file is generated successfully in my resource folder.
next I run sbt assembly and executable jar is generated successfully.
When i run java -jar category-assembly-0.1.jar i get the following error
no main manifest attribute, in category-assembly-0.1.jar
I tried many steps given in the internet but i keep getting this error
UPDATE
Currently following is included in my build.sbt.
lazy val spark: Project = project
.in(file("./spark"))
.settings(
mainClass in assembly := Some("my.test.Category"),
mainClass in (Compile, packageBin) := Some("my.test.Category"),
mainClass in (Compile, run) := Some("my.test.Category")
)
assemblyMergeStrategy in assembly := {
case "reference.conf" => MergeStrategy.concat
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs#_*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
After building the artifacts and run the command sbt assembly and tried running the genrated jar im still getting the same error as follows
no main manifest attribute, in category-assembly-0.1.jar
I had the same issue when i was deploying on EMR.
Add the following to your setting in build.sbt and it will be alright.
lazy val spark: Project = project
.in(file("./spark"))
.settings(
...
// set the main class for packaging the main jar
mainClass in (Compile, packageBin) := Some("com.orgname.name.ClassName"),
mainClass in (Compile, run) := Some("com.orgname.name.ClassName"),
...
)
This essentially sets the name of your existing class in your project as default mainClass. i set both for packageBin and run so you should be alright.
Just don't forget to rename the com.orgname.name.ClassName to your classname.
(This is just to refresh your memory)
Classname consists of <PackageName>/<ClassName>.
For example:
package com.orgname.name
object ClassName {}

sbt-assembly does not pick up configuration specific settings

I am updating an old 0.7.x build file from the tool sbt that thankfully removed the reference to "simple" from its name in the meantime.
Something that once worked, does not do so any longer. I had different config entries for platform specific assembly tasks. These include specific filters that for some reason are now called assemblyExcludedJars instead of excludedJars, and specific jar names that for some reason are now called assemblyJarName instead of jarName.
Basically:
val Foo = config("foo") extend Compile
lazy val assemblyFoo = TaskKey[File]("assembly-foo")
lazy val root = Project(id = "root", base = file("."))
// .configs(Foo) // needed? doesn't change anything
.settings(
inConfig(Foo)(inTask(assembly) {
assemblyJarName := "wtf.jar"
}),
scalaVersion := "2.11.7",
assemblyFoo <<= assembly in Foo
)
Now I would expect that if I run sbt assembly-foo or sbt foo:assembly, it would produce a file wtf.jar. But I am getting the default root-assembly-0.1-snapshot.jar. The same problem happens when I try to specify assemblyExcludedJars, they are simply ignored and still included.
If I remove the inConfig it works:
lazy val root = Project(id = "root", base = file("."))
.settings(
inTask(assembly) {
assemblyJarName := "wtf.jar"
},
scalaVersion := "2.11.7",
assemblyFoo <<= assembly in Foo
)
But now I cannot use different jar names for different configurations (which is the whole point).
As described in a blog post by one of sbt's authors and the author of sbt-assembly, this should work. It was also written in this Stackoverflow question. But the example requires an antique version of sbt-assembly (0.9.0 from 2013, before auto plugins etc.) and doesn't seem to apply to the current versions.
If one defines a new configuration, one has to redefine (?) all the tasks one is about to use. Apparently for sbt-assembly, this means running baseAssemblySettings:
val Foo = config("foo") extend Compile
lazy val assemblyFoo = TaskKey[File]("assembly-foo")
lazy val root = Project(id = "root", base = file("."))
.settings(
inConfig(Foo)(baseAssemblySettings /* !!! */ ++ inTask(assembly) {
jarName := "wtf.jar"
}),
scalaVersion := "2.11.7",
assemblyFoo := (assembly in Foo).value
)
Tested with sbt 0.13.9 and sbt-assembly 0.14.1.

sbt: publish generated sources

I have a project where part of the sources are generated (sourceGenerators in Compile). I noticed that (in most scenarios reasonably) these sources are not published with publishLocal or publishSigned. In this case this is unfortunate because when you use this project/library as a dependency, you cannot look up the sources, for example in IntelliJ, even if the other sources of the project have been downloaded.
Can I configure sbt's publishing settings to include the generated sources in the Maven -sources.jar?
So, just to be complete, this was my solution based on #pfn's answer:
mappings in (Compile, packageSrc) ++= {
val base = (sourceManaged in Compile).value
val files = (managedSources in Compile).value
files.map { f => (f, f.relativeTo(base).get.getPath) }
}
mappings in (Compile,packageSrc) := (managedSources in Compile).value map (s => (s,s.getName)),
Just like #0__'s answer, but ported to the 'new' sbt syntax, i.e. without deprecation warnings.
Compile/packageSrc/mappings ++= {
val base = (Compile/sourceManaged).value
val files = (Compile/managedSources).value
files.map(f => (f, f.relativeTo(base).get.getPath))
}

Compile with different settings in different commands

I have a project defined as follows:
lazy val tests = Project(
id = "tests",
base = file("tests")
) settings(
commands += testScalalib
) settings (
sharedSettings ++ useShowRawPluginSettings ++ usePluginSettings: _*
) settings (
libraryDependencies <+= (scalaVersion)("org.scala-lang" % "scala-reflect" % _),
libraryDependencies <+= (scalaVersion)("org.scala-lang" % "scala-compiler" % _),
libraryDependencies += "org.tukaani" % "xz" % "1.5",
scalacOptions ++= Seq()
)
I would like to have three different commands which will compile only some files inside this project. The testScalalib command added above for instance is supposed to compile only some specific files.
My best attempt so far is:
lazy val testScalalib: Command = Command.command("testScalalib") { state =>
val extracted = Project extract state
import extracted._
val newState = append(Seq(
(sources in Compile) <<= (sources in Compile).map(_ filter(f => !f.getAbsolutePath.contains("scalalibrary/") && f.name != "Typers.scala"))),
state)
runTask(compile in Compile, newState)
state
}
Unfortunately when I use the command, it still compiles the whole project, not just the specified files...
Do you have any idea how I should do that?
I think your best bet would be to create different configurations like compile and test, and have the appropriate settings values that would suit your needs. Read Scopes in the official sbt documentation and/or How to define another compilation scope in SBT?
I would not create additional commands, I would create an extra configuration, as #JacekLaskowski suggested, and based on the answer he had cited.
This is how you can do it (using Sbt 0.13.2) and Build.scala (you could of course do the same in build.sbt, and older Sbt version with different syntax)
import sbt._
import Keys._
object MyBuild extends Build {
lazy val Api = config("api")
val root = Project(id="root", base = file(".")).configs(Api).settings(custom: _*)
lazy val custom: Seq[Setting[_]] = inConfig(Api)(Defaults.configSettings ++ Seq(
unmanagedSourceDirectories := (unmanagedSourceDirectories in Compile).value,
classDirectory := (classDirectory in Compile).value,
dependencyClasspath := (dependencyClasspath in Compile).value,
unmanagedSources := {
unmanagedSources.value.filter(f => !f.getAbsolutePath.contains("scalalibrary/") && f.name != "Typers.scala")
}
))
}
now when you call compile everything will get compiled, but when you call api:compile only the classes matching the filter predicate.
Btw. You may want to also look into the possibility of defining different unmanagedSourceDirectories and/or defining includeFilter.

How to exclude assembly from package in sbt?

I have a build.scala file that has a section that looks something like the clip below.
I use sbt-assembly to build a jar file of all dependent libs for deployment.
This builds fine. My problem is that I run 'assembly' and that builds ~16MB core-deps file, then I run 'package' trying to build the core.jar file. It builds core.jar--but then it overwrites my core-deps.jar file with an empty file (because core-deps has no code of its own).
How can I build both core.jar and core-deps.jar and not have 'package' blow away core-deps.jar?
lazy val deps = Project("core-deps", file("."),
settings = basicSettings ++
sbtassembly.Plugin.assemblySettings ++
Seq(assemblyOption in assembly ~= { _.copy(includeScala = false) }) ++
addArtifact(Artifact("core-deps", "core-deps"), sbtassembly.Plugin.AssemblyKeys.assembly) ++
Seq(
libraryDependencies ++=
// Master list of all used libraries so it gets added to the deps.jar file when you run assembly
compile(commons_exec, commons_codec, commons_lang, casbah, googleCLHM, joda_time, scalajack, spray_routing, spray_can, spray_client, spray_caching, akka_actor, akka_cluster, akka_slf4j, prettytime, mongo_java, casbah_gridfs, typesafe_config, logback),
jarName in assembly <<= (scalaVersion, version) map { (scalaVersion, version) => "core-deps_" + scalaVersion.dropRight(2) + "-" + version + ".jar" }
)) aggregate(core)
lazy val core = project
.settings(basicSettings: _*)
.settings(buildSettings: _*)
.settings(libraryDependencies ++=
compile(commons_exec, prettytime, commons_codec, casbah, googleCLHM, scalajack, casbah_gridfs, typesafe_config, spray_routing, spray_client, spray_can, spray_caching, akka_actor, akka_slf4j, akka_cluster, logback) ++
test(scalatest, parboiled, spray_client)
)
Why not use assemblyPackageDependency task that comes with sbt-assembly? See Excluding Scala library, your project, or deps JARs.
If for some reason you really want to disable package task in core-deps project, you could try rewiring the packageBin:
packageBin := (outputPath in assembly).value
That will do nothing but return the file name.