Spark / scala : publish a single fat jar with sbt - scala

I am trying to deploy a single Fat Jar from a spark/scala project to a private nexus repository. The publish works fine but I get a large amount of files instead of one single assembly Jar :
*-assembly.jar
*-assembly.jar.md5
*-assembly.jar.sha1
*-javadoc.jar
*-javadoc.jar.md5
*-javadoc.jar.sha1
*-source.jar
*-source.jar.md5
*-source.jar.sha1
*.jar
*.jar.md5
*.jar.sha1
*-pom.jar
*-pom.jar.md5
*-pom.jar.sha1
My built.sbt is :
name := "SampleApp"
version := "0.1-SNAPSHOT"
scalaVersion := "2.12.14"
ThisBuild / useCoursier := false
libraryDependencies ++= Seq(
"com.github.scopt" %% "scopt" % "4.0.1",
"org.apache.spark" %% "spark-sql" % "3.1.1" % "provided"
)
resolvers ++= Seq(
"confluent" at "https://packages.confluent.io/maven/"
)
assembly / assemblyMergeStrategy := {
case PathList("META-INF","services",xs # _*) => MergeStrategy.filterDistinctLines
case PathList("META-INF",xs # _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
assembly / assemblyExcludedJars := {
val cp = (assembly / fullClasspath).value
cp filter { f =>
f.data.getName.contains("hadoop-hdfs-2")
f.data.getName.contains("hadoop-client")
}
}
assembly / artifact := {
val art = (assembly / artifact).value
art.withClassifier(Some("assembly"))
}
assembly / assemblyJarName := s"${name.value}-${version.value}.jar"
addArtifact(assembly / artifact, assembly)
resolvers += ("Sonatype Nexus Repository Manager" at "http://localhost:8081/repository/app/").withAllowInsecureProtocol(true)
credentials += Credentials("Sonatype Nexus Repository Manager", "localhost:8081", "user", "pass")
publishTo := {
val nexus = "http://localhost:8081/repository/app/"
if (isSnapshot.value) {
Some("snapshots" at nexus + "test-SNAPSHOT")
} else
Some("releases" at nexus + "test-RELEASE")
}
Is there a way to filter files with sbt before the publish in order to get only the *-assembly.jar ? Thanks a lot.

This should disable the publish of unnecessary files in my case :
publishArtifact in (Compile, packageBin) := false
publishArtifact in (Compile, packageDoc) := false
publishArtifact in (Compile, packageSrc) := false

Related

sbt run Play as a sumodule

I was wondering if I can run Play in development as a submodule?
Here is the project structure:
Root
submodule1
submodule2(Play)
submodule3
......
There is only one build.sbt file under the root level, which specified the structure of the project.
Previously, I had the Play module as a separate project, which has its own build.sbt, and I could simply run it in development mode with sbt run.
However, after I combined the Play module with the other modules into one project, sbt "project projectName" run can still start the Play module in development mode, but when I tried to hit any endpoint, I got injection error, such as No implementation for models.services.UserService was bound.
Here is a portion of the build.sbt. emberconflagration is the Play project.
lazy val root = project
.in(file("."))
.aggregate(embercore)
.aggregate(emberdataset)
.aggregate(emberconflagration)
.aggregate(emberservice)
.aggregate(embersession)
.aggregate(emberetl)
.aggregate(embermodel)
.aggregate(emberclient)
.aggregate(emberserver)
.aggregate(emberservermongo)
.aggregate(emberstreaming)
.settings(commonSettings: _*)
.settings(name := "ember-conflagration-root")
lazy val embercore = (project in file("./embercore"))
.settings(commonSettings: _*)
.settings(testSettings: _*)
.settings(
name := "embercore",
libraryDependencies ++= Seq(
Libraries.scalatest,
Libraries.play_json
),
resolvers ++= Seq(
"Apache Repository" at "https://repository.apache.org/content/repositories/releases/",
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
Resolver.sonatypeRepo("public")
),
javaOptions in assembly += "-xmx6g"
)
lazy val emberservice = (project in file("./emberservice"))
.settings(commonSettings: _*)
.settings(testSettings: _*)
.settings(
name := "emberservice",
libraryDependencies ++= Seq(
Libraries.scalatest,
Libraries.play_json,
Libraries.scala_j
),
resolvers ++= Seq(
"Apache Repository" at "https://repository.apache.org/content/repositories/releases/",
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
Resolver.sonatypeRepo("public")
),
javaOptions in assembly += "-xmx6g"
)
.dependsOn(embercore)
lazy val emberdataset = (project in file("./emberdataset"))
.settings(commonSettings: _*)
.settings(testSettings: _*)
.settings(
name := "emberdataset",
libraryDependencies ++= Seq(
Libraries.scalatest,
Libraries.spark_core_conflP,
Libraries.spark_mllib_conflP,
Libraries.spark_sql_conflP,
Libraries.mysql_connector,
Libraries.spark_xml
),
resolvers ++= Seq(
"Apache Repository" at "https://repository.apache.org/content/repositories/releases/",
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
Resolver.sonatypeRepo("public")
),
javaOptions in assembly += "-xmx6g"
)
.dependsOn(embercore)
lazy val embersession = (project in file("./embersession"))
.enablePlugins(LauncherJarPlugin)
.settings(commonSettings: _*)
.settings(testSettings: _*)
.settings(
name := "embersession",
libraryDependencies ++= Seq(
Libraries.scalatest,
Libraries.h2o_sparkling_water_core
exclude ("com.fasterxml.jackson.core", "jackson-core")
exclude ("com.fasterxml.jackson.core", "jackson-databind")
exclude ("javax.servlet", "servlet-api"),
// exclude ("org.apache.spark", "spark-sql_2.11") // Comment out for standard build. Uncomment for EmberSession assembly.
// exclude ("org.apache.spark", "spark-core_2.11") // Comment out for standard build. Uncomment for EmberSession assembly.
// exclude ("org.apache.spark", "spark-mllib_2.11") // Comment out for standard build. Uncomment for EmberSession assembly.
// exclude ("org.apache.hadoop", "hadoop-common"), // Comment out for standard build. Uncomment for EmberSession assembly.
Libraries.jackson_scala,
Libraries.duke,
Libraries.scala_library,
Libraries.spark_core_conflP,
Libraries.spark_mllib_conflP,
Libraries.spark_sql_conflP
),
mainClass in assembly := Some("com.metistream.ember.embersession.Test"),
javaOptions in assembly += "-xmx6g",
resolvers ++= Seq(
"Apache Repository" at "https://repository.apache.org/content/repositories/releases/",
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
Resolver.sonatypeRepo("public")
),
assemblyShadeRules in assembly ++= Seq(
ShadeRule.rename("com.esotericsoftware.kryo.**" -> "emberkryo.#1").inAll,
ShadeRule.rename("com.fasterxml.jackson.**" -> "emberjackson.#1").inAll,
ShadeRule.rename("play.api.**" -> "emberplay.play.api.#1").inAll
),
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines
case PathList("reference.conf") => MergeStrategy.concat
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
)
.dependsOn(embercore, emberdataset, emberservice)
lazy val emberconflagration = (project in file("security-layer"))
.enablePlugins(PlayScala)
.settings(commonSettings: _*)
.settings(testSettings: _*)
.settings(
name := "ember-conflagration",
libraryDependencies ++= Seq(
jdbc,
cache,
ws,
filters,
Libraries.play2_reactivemongo,
Libraries.mockito,
Libraries.embed_mongo,
Libraries.play_silhouette,
Libraries.silhouette_password,
Libraries.silhouette_persistence,
Libraries.silhouette_crypto,
Libraries.silhouette_testkit,
Libraries.scala_guice,
Libraries.ficus,
Libraries.mleap_runtime,
Libraries.google_api_dataproc,
Libraries.scalaj_http,
Libraries.google_core,
Libraries.google_client,
Libraries.google_sqladmin,
Libraries.google_cloud_compute,
Libraries.google_api_services,
Libraries.google_cloud_storage,
Libraries.postgresql_connector,
Libraries.jersey_media_glass,
Libraries.jersey_core_glass,
Libraries.jackson_xml,
Libraries.jackson_scala,
Libraries.janino,
Libraries.play_json_extensions,
Libraries.hapi_base,
Libraries.hapi_structures_v21,
Libraries.hapi_structures_v22,
Libraries.hapi_structures_v23,
Libraries.hapi_structures_v231,
Libraries.hapi_structures_v24,
Libraries.hapi_structures_v25,
Libraries.hapi_structures_v251,
Libraries.hapi_structures_v26,
Libraries.play_quartz_extension,
Libraries.elastic4s_core,
Libraries.elastic4s_http,
Libraries.scalatest,
specs2 % Test,
Libraries.uimaj_core,
Libraries.spark_core_confl,
Libraries.spark_mllib_confl,
Libraries.spark_sql_confl,
Libraries.unbound_id,
Libraries.swagger
),
coverageExcludedPackages := "<empty>;Reverse.*;router\\..*",
// Assembly Settings
assemblyShadeRules in assembly ++= Seq(
ShadeRule
.rename("com.google.common.**" -> "my_guava.#1")
.inLibrary("com.google.api-client" % "google-api-client" % "1.22.0")
),
mainClass in assembly := Some("play.core.server.ProdServerStart"),
fullClasspath in assembly += Attributed.blank(PlayKeys.playPackageAssets.value),
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case PathList("META-INF", xs # _*) => MergeStrategy.first
case x => MergeStrategy.first
},
projectDependencies := {
Seq(
(projectID in emberdataset).value.excludeAll(ExclusionRule(organization = "org.slf4j"),
ExclusionRule(organization = "io.netty"))
)
},
fork in run := false,
resolvers ++= Seq(
"elasticsearch-releases" at "https://artifacts.elastic.co/maven",
"sonatype snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/",
Resolver.bintrayRepo("iheartradio", "maven"),
"Atlassian Releases" at "https://maven.atlassian.com/public/",
Resolver.url("Typesafe Ivy releases", url("https://repo.typesafe.com/typesafe/ivy-releases"))(
Resolver.ivyStylePatterns)
),
javaOptions in assembly += "-xmx6g"
)
.dependsOn(embercore, emberdataset, emberetl)
I think it is just your particular app that doesn't work. This is possibly because you have multiple packages which are colliding with each other. Here's an example that works.
Directory structure
./build.sbt
./mod1
./mod1/src
./mod1/src/main
./mod1/src/main/scala
./mod1/src/main/scala/Hello.scala
./mod2
./mod2/app
./mod2/app/controllers
./mod2/app/controllers/HomeController.scala
./mod2/app/modules/MainModule.scala
./mod2/conf
./mod2/conf/application.conf
./mod2/conf/logback.xml
./mod2/conf/messages
./mod2/conf/routes
./mod3
./mod3/src
./mod3/src/main
./mod3/src/main/scala
./mod3/src/main/scala/Hello.scala
./project/plugins.sbt
Important files:
build.sbt
name := """root"""
organization := "com.example"
version := "1.0-SNAPSHOT"
lazy val mod1 = project in file("mod1")
lazy val mod2 = (project in file("mod2"))
.enablePlugins(PlayScala)
.settings(
name := """myplayproject""",
organization := "com.example",
version := "1.0-SNAPSHOT",
scalaVersion := "2.12.6",
libraryDependencies += guice,
libraryDependencies += "org.scalatestplus.play" %% "scalatestplus-play" % "3.1.2" % Test
)
lazy val mod3 = project in file("mod3")
lazy val root = (project in file("."))
.aggregate(mod1)
.aggregate(mod2)
.aggregate(mod3)
HomeController.scala
package controllers
import javax.inject._
import play.api._
import play.api.mvc._
trait MyServiceLike {
def getString: String
}
class MyService extends MyServiceLike {
override def getString: String = "hello"
}
#Singleton
class HomeController #Inject()(cc: ControllerComponents, myService: MyServiceLike) extends AbstractController(cc) {
def index() = Action { implicit request: Request[AnyContent] =>
Ok(myService.getString)
}
}
project/plugins.sbt
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.6.16")
mod2/conf/application.conf
play.modules.enabled += "modules.MainModule"
mod2/app/modules/MainModule.scala
package modules
import controllers.MyService
import controllers.MyServiceLike
import play.api.Configuration
import play.api.Environment
import play.api.inject.Binding
import play.api.inject.Module
class MainModule extends Module {
override def bindings(environment: Environment, configuration: Configuration): Seq[Binding[_]] = Seq(
bind[MyServiceLike].to[MyService]
)
}
Run
Then, you can run by starting sbt:
project mod2
run

How to avoid writing pom when running docker:publishLocal?

I have two subprojects on build.sbt and one depends on the other (the cli depends on and aggregates core). The core will be published as a library and the cli will be publish as a docker image. The problem is when I do cli/docker:publishLocal I can't avoid the pom being written... How to avoid this? This is the current build.sbt file:
import sbt.Keys._
import sbt._
val scalaBinaryVersionNumber = "2.12"
val scalaVersionNumber = s"$scalaBinaryVersionNumber.4"
resolvers += Resolver.bintrayIvyRepo("sbt", "sbt-plugin-releases")
lazy val aggregatedProjects: Seq[ProjectReference] = Seq(core, cli)
lazy val testDependencies = Dependencies.specs2.map(_ % Test)
lazy val root = project
.in(file("."))
.settings(name := "root")
.settings(inThisBuild(List(
//Credentials for sonatype
credentials += Credentials(
"Sonatype Nexus Repository Manager",
"oss.sonatype.org",
sys.env.getOrElse("SONATYPE_USER", "username"),
sys.env.getOrElse("SONATYPE_PASSWORD", "password")),
scalaVersion := scalaVersionNumber,
version := "0.1.0-SNAPSHOT",
organization := "com.test",
scalacOptions ++= Common.compilerFlags,
scalacOptions.in(Test) ++= Seq("-Yrangepos"),
scalacOptions.in(Compile, console) --= Seq("-Ywarn-unused:imports", "-Xfatal-warnings"))))
.aggregate(aggregatedProjects: _*)
.settings(publish := {}, publishLocal := {}, publishArtifact := false)
lazy val core = project
.in(file("core"))
.settings(name := "core")
.settings(
// App Dependencies
libraryDependencies ++= Seq(
Dependencies.caseApp,
Dependencies.betterFiles,
Dependencies.jodaTime,
Dependencies.fansi,
Dependencies.scalajHttp,
Dependencies.cats) ++
Dependencies.circe ++
Dependencies.jackson ++
Dependencies.log4s
// Test Dependencies
libraryDependencies ++= testDependencies)
.settings(
// Sonatype repository settings
publishMavenStyle := true,
publishArtifact.in(Test) := false,
//publishArtifact.in(makePom.in(Docker)) := false,
publish.in(Docker) := {},
publishLocal.in(Docker) := {},
pomIncludeRepository := { _ =>
false
},
publishTo := sonatypePublishTo.value)
.settings(pomExtra := <scm>
<url>https://github.com/test</url>
<connection>scm:git:git#github.com:test.git</connection>
<developerConnection>scm:git:https://github.com/test.git</developerConnection>
</scm>)
lazy val cli = project
.in(file("cli"))
.enablePlugins(JavaAppPackaging)
.enablePlugins(DockerPlugin)
.settings(name := "cli")
.settings(Common.dockerSettings: _*)
.settings(Common.genericSettings: _*)
.settings(
//publish := publish.in(Docker).value,
//publishLocal := publishLocal.in(Docker).value,
publishArtifact := false)
.settings(libraryDependencies ++= testDependencies)
.dependsOn(core)
.aggregate(core)
// Scapegoat
scalaVersion in ThisBuild := scalaVersionNumber
scalaBinaryVersion in ThisBuild := scalaBinaryVersionNumber
scapegoatDisabledInspections in ThisBuild := Seq()
scapegoatVersion in ThisBuild := "1.3.4"
As you can see, I already tried to add "publishArtifact.in(makePom.in(Docker)) := false" to the core settings but this doesn't write the pom when I do core/publishLocal... I just want to avoid writing the pom when I do cli/docker:publishLocal.

How to include test dependencies in sbt-assembly jar?

I am unable to package my test dependencies in my test assembly jar. Here is an excerpt from my build.sbt:
...
name := "project"
scalaVersion := "2.10.6"
assemblyOption in (Compile, assembly) := (assemblyOption in (Compile, assembly)).value.copy(includeScala = false)
fork in Test := true
parallelExecution in IntegrationTest := false
lazy val root = project.in(file(".")).configs(IntegrationTest.extend(Test)).settings(Defaults.itSettings: _ *)
Project.inConfig(Test)(baseAssemblySettings)
test in (Test, assembly) := {}
assemblyOption in (Test, assembly) := (assemblyOption in (Test, assembly)).value.copy(includeScala = false, includeDependency = true)
assemblyJarName in (Test, assembly) := s"${name.value}-test.jar"
fullClasspath in (Test, assembly) := {
val cp = (fullClasspath in Test).value
cp.filter{ file => (file.data.name contains "classes") || (file.data.name contains "test-classes")} ++ (fullClasspath in Runtime).value
}
libraryDependencies ++= Seq(
...
"com.typesafe.play" %% "play-json" % "2.3.10" % "test" excludeAll ExclusionRule(organization = "joda-time"),
...
)
...
When I assemble my fat jar using sbt test:assembly, is produces the fat jar project-test.jar, but the play-json dependencies aren't being packaged in:
$ jar tf /path/to/project-test.jar | grep play
$
However, if I remove the "test" configuration from the play-json dep (i.e. "com.typesafe.play" %% "play-json" % "2.3.10" excludeAll ExclusionRule(organization = "joda-time")), I can see it being included:
$ jar tf /path/to/project-test.jar | grep play
...
play/libs/Json.class
...
$
Am I doing anything wrong and/or missing anything? My goal here is to include the play-json library in ONLY the test:assembly jar and NOT the assembly jar
I have left out a crucial part in the original build.sbt excerpt I posted above which turned out to be the cause of the issuse:
fullClasspath in (Test, assembly) := {
val cp = (fullClasspath in Test).value
cp.filter{ file => (file.data.name contains "classes") || (file.data.name contains "test-classes")} ++ (fullClasspath in Runtime).value
}
This code block was essentially filter out deps from the test classpath. We include this to avoid painful merge conflicts. I fixed this by adding logic to include the play-json dep that was needed:
fullClasspath in (Test, assembly) := {
val cp = (fullClasspath in Test).value
cp.filter{ file =>
(file.data.name contains "classes") ||
(file.data.name contains "test-classes") ||
// sorta hacky
(file.data.name contains "play")
} ++ (fullClasspath in Runtime).value
}

SBT assembly falis

I am running a spark job through intellij. Job executes and gives me output. i need to take this job as jar file to server and run, but when i try to do sbt assembly it throws below error:
[error] Not a valid command: assembly
[error] Not a valid project ID: assembly
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assembly
[error] assembly
my sbt version is 0.13.8
below is my build.sbt file:
import sbt._, Keys._
name := "mobilewalla"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.0.0",
"org.apache.spark" %% "spark-sql" % "2.0.0")
i added a file assembly.sbt under project dir. it contains:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
what am i missing here
Add these lines in your build.sbt
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
mainClass in assembly := Some("com.SparkMain")
resolvers += "spray repo" at "http://repo.spray.io"
assemblyJarName in assembly := "streaming-api.jar"
and include these lines in your plugins.sbt file
addSbtPlugin("io.spray" % "sbt-revolver" % "0.7.2")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
To assemble the multiple jars to one u need add below plugin in plugins.sbt under project directory.
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
If u need to customize the assembled jar to trigger specific MainClass take example assembly.sbt
import sbtassembly.Plugin.AssemblyKeys._
Project.inConfig(Compile)(baseAssemblySettings)
mainClass in (Compile, assembly) := Some("<main application name with package path>")
jarName in (Compile, assembly) := s"${name.value}-${version.value}-dist.jar"
//below is merge strategy to make what all file need to exclude or include
mergeStrategy in (Compile, assembly) <<= (mergeStrategy in (Compile, assembly)) {
(old) => {
case PathList(ps # _*) if ps.last endsWith ".html" =>MergeStrategy.first
case "META-INF/MANIFEST.MF" => MergeStrategy.discard
case x => old(x)
}
}

What is the proper play-mini project setup for a proper production deployment

Play mini doesn't work like a play project at all really unless i am missing something. it runs within sbt and you cannot use play commands.
https://github.com/typesafehub/play2-mini
So how do you deploy this stuff to production? I've tried one-jar and assembly too and it just doesn't work for me
Ive stried the start-script/stage approach but it cannto seem to find my mainclass:
sbt
>add-start-script-tasks
>stage
[info] Wrote start script for mainClass := None to /Users/rmedlin/rtbv2/target/start
This is my Build.scala. Ive also tried: mainClass in (Compile, stage, run) and many other combinations
object Build extends Build {
override lazy val settings = super.settings
lazy val root = Project(id = "rtbv2",
base = file("."), settings = Project.defaultSettings).settings(
resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/",
resolvers += "Typesafe Snapshot Repo" at "http://repo.typesafe.com/typesafe/snapshots/",
libraryDependencies += "com.typesafe" %% "play-mini" % "2.0.1",
mainClass in (Compile, run) := Some("play.core.server.NettyServer"))
}
my Build.scala was incorrect and i was able to get the assembly command working:
trait ConfigureScalaBuild {
lazy val typesafe = "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
lazy val typesafeSnapshot = "Typesafe Snapshots Repository" at "http://repo.typesafe.com/typesafe/snapshots/"
val netty = Some("play.core.server.NettyServer")
def scalaMiniProject(org: String, name: String, buildVersion: String, baseFile: java.io.File = file(".")) = Project(id = name, base = baseFile, settings = Project.defaultSettings ++ assemblySettings).settings(
version := buildVersion,
organization := org,
resolvers += typesafe,
resolvers += typesafeSnapshot,
logManager <<= extraLoggers(com.typesafe.util.Sbt.logger),
libraryDependencies += "com.typesafe" %% "play-mini" % "2.0.1",
mainClass in (Compile, run) := netty,
mainClass in assembly := netty,
ivyXML := <dependencies> <exclude org="org.springframework"/> </dependencies>
)
}