How to exclude test dependencies with sbt-assembly - scala

I have an sbt project that I am trying to build into a jar with the sbt-assembly plugin.
build.sbt:
name := "project-name"
version := "0.1"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test",
// spark-hive dependencies for DataFrameSuiteBase. https://github.com/holdenk/spark-testing-base/issues/143
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
//"org.apache.hadoop" % "hadoop-aws" % "3.1.1"
"org.json" % "json" % "20180813"
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
test in assembly := {}
// https://github.com/holdenk/spark-testing-base
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false
When I build the project with sbt assembly, the resulting jar contains /org/junit/... and /org/opentest4j/... files
Is there any way to not include these test related files in the final jar?
I have tried replacing the line:
"org.scalatest" %% "scalatest" % "3.0.5" % "test"
with:
"org.scalatest" %% "scalatest" % "3.0.5" % "provided"
I am also wondering how the files are included in the jar as junit is not referenced inside build.sbt (there are junit tests in the project however)?
Updated:
name := "project-name"
version := "0.1"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.0"
val excludeJUnitBinding = ExclusionRule(organization = "junit")
libraryDependencies ++= Seq(
// Provided
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
// Test
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
// Necessary
"org.json" % "json" % "20180813"
)
excludeDependencies += excludeJUnitBinding
// https://stackoverflow.com/questions/25144484/sbt-assembly-deduplication-found-error
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
// https://github.com/holdenk/spark-testing-base
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false

To exclude certain transitive dependencies of a dependency, use the excludeAll or exclude methods.
The exclude method should be used when a pom will be published for the project. It requires the organization and module name to exclude.
For example:
libraryDependencies +=
"log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms")
The excludeAll method is more flexible, but because it cannot be represented in a pom.xml, it should only be used when a pom doesn’t need to be generated.
For example,
libraryDependencies +=
"log4j" % "log4j" % "1.2.15" excludeAll(
ExclusionRule(organization = "com.sun.jdmk"),
ExclusionRule(organization = "com.sun.jmx"),
ExclusionRule(organization = "javax.jms")
)
In certain cases a transitive dependency should be excluded from all dependencies. This can be achieved by setting up ExclusionRules in excludeDependencies(For sbt 0.13.8 and above).
excludeDependencies ++= Seq(
ExclusionRule("commons-logging", "commons-logging")
)
JUnit jar file downloads as part of below dependencies.
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" //(junit)
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"// (junit)
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" //(org.junit)
To exclude junit file please update your dependency as below.
val excludeJUnitBinding = ExclusionRule(organization = "junit")
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" excludeAll(excludeJUnitBinding)
Update:
Please update your build.abt as below.
resolvers += Resolver.url("bintray-sbt-plugins",
url("https://dl.bintray.com/eed3si9n/sbt-plugins/"))(Resolver.ivyStylePatterns)
val excludeJUnitBinding = ExclusionRule(organization = "junit")
libraryDependencies ++= Seq(
// Provided
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
//"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
//"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
//"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
// Test
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
// Necessary
"org.json" % "json" % "20180813"
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false
plugin.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
I have tried and it's not downloading junit jar file.

Related

Unable to write files after scala and spark upgrade

My project was previously using Scala version 2.11.12 which I have upgraded to 2.12.10 and the Spark version has been upgraded from 2.4.0 to 3.1.2. See the build.sbt file below with the rest of the project dependencies and versions:
scalaVersion := "2.12.10"
val sparkVersion = "3.1.2"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % sparkVersion % "provided"
libraryDependencies += "org.xerial.snappy" % "snappy-java" % "1.1.4"
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.8"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.8" % "test, it"
libraryDependencies += "com.holdenkarau" %% "spark-testing-base" % "3.1.2_1.1.0" % "test, it"
libraryDependencies += "com.github.pureconfig" %% "pureconfig" % "0.12.1"
libraryDependencies += "com.typesafe" % "config" % "1.3.2"
libraryDependencies += "org.pegdown" % "pegdown" % "1.1.0" % "test, it"
libraryDependencies += "com.github.scopt" %% "scopt" % "3.7.1"
libraryDependencies += "com.github.pathikrit" %% "better-files" % "3.8.0"
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.9.2"
libraryDependencies += "com.amazon.deequ" % "deequ" % "2.0.0-spark-3.1" excludeAll (
ExclusionRule(organization = "org.apache.spark")
)
libraryDependencies += "net.liftweb" %% "lift-json" % "3.4.0"
libraryDependencies += "com.crealytics" %% "spark-excel" % "0.13.1"
The app is building fine after the upgrade but it is unable to write files to the filesystem which was working fine before the upgrade. I havent made any code changes to the write logic.
The relevant portion of code that writes to the files is shown below.
val inputStream = getClass.getResourceAsStream(resourcePath)
val conf = spark.sparkContext.hadoopConfiguration
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val output = fs.create(new Path(outputPath))
IOUtils.copyBytes(inputStream, output.getWrappedStream, conf, true)
I am wondering if IOUtils is not compatible with the new Scala/Spark versions?

New sbt application using the akka-http template, how to determine resolvers and add maven central?

I have a new sbt application that I built using the akka http g8 template.
I am trying to add reactivemongo 1.0 to my build and I am getting this error:
not found: https://repo1.maven.org/maven2/org/reactivemongo/reactivemongo_2.13/1.0/reactivemongo_2.13-1.0.pom
The documentation says this library is in maven central.
How can I determine which resolver my project is using by default currently in sbt?
Is it possible that this library is not built for scala 2.13.3 or 2.13.1?
How can I debug this type of error.
Thanks!
build.sbt:
import Dependencies._
lazy val akkaHttpVersion = "10.2.1"
lazy val akkaVersion = "2.6.10"
lazy val root = (project in file("."))
.settings(
inThisBuild(
List(
organization := "com.example",
scalaVersion := "2.13.3"
)
),
name := "akka-http",
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-http" % akkaHttpVersion,
"com.typesafe.akka" %% "akka-http-spray-json" % akkaHttpVersion,
"com.typesafe.akka" %% "akka-actor-typed" % akkaVersion,
"com.typesafe.akka" %% "akka-stream" % akkaVersion,
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.softwaremill.macwire" %% "macros" % "2.3.3" % "provided",
"com.softwaremill.macwire" %% "util" % "2.3.3" % "provided",
"com.github.blemale" %% "scaffeine" % "3.1.0" % "compile",
"org.typelevel" %% "cats-core" % "2.1.1",
"com.lihaoyi" %% "scalatags" % "0.8.2",
"com.github.pureconfig" %% "pureconfig" % "0.13.0",
"org.reactivemongo" %% "reactivemongo" % "1.0",
"com.typesafe.akka" %% "akka-http-testkit" % akkaHttpVersion % Test,
"com.typesafe.akka" %% "akka-actor-testkit-typed" % akkaVersion % Test,
"org.scalatest" %% "scalatest" % "3.0.8" % Test
)
)
.enablePlugins(JavaAppPackaging)
Can you try replacing "org.reactivemongo" %% "reactivemongo" % "1.0" with "org.reactivemongo" %% "reactivemongo" % "1.0.0" % "provided"?
I copy it from Maven Repository https://mvnrepository.com/artifact/org.reactivemongo/reactivemongo_2.13/1.0.0

Failed to load class in Spark-submit

I'm running a jar file with Spark-submit, but i keep getting this error:
Error: Failed to load class antarctic.DataQuality.
This is the command:
spark-submit --class antarctic.DataQuality --master local[*] --deploy-mode client --jars "path/to/file.jar" "arg 1" "arg 2" "arg 3" "arg 4"
This is the structure of the Scala file:
Scala Structure
The command is being run in target/scala-2.11
I also have this enviroment variables defined:
JAVA_HOME: C:\Program Files\Java\jdk1.8.0_251
HADOOP_HOME: C:\winutils
SPARK_HOME: C:\Users\Spark
SPARK_USER: C:\Users\Spark
build.sbt:
name := "DataQuality"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)
// https://github.com/awslabs/deequ
libraryDependencies += "com.amazon.deequ" % "deequ" % "1.0.2-antarctic" from "file:///C:/Users/AKAINIX ANALYTICS/Documents/Lucas/Antarctic/Bitbucket/plataforma-dataquality/Backend/deequ/target/deequ-1.0.2-antarctic.jar"
//https://github.com/spray/spray-json
//libraryDependencies += "io.spray" %% "spray-json" % "1.3.5"
//https://github.com/scopt/scopt
libraryDependencies += "com.github.scopt" %% "scopt" % "4.0.0-RC2"
//https://circe.github.io/circe/
val circeVersion = "0.7.0"
libraryDependencies ++= Seq(
"io.circe" %% "circe-core" % circeVersion,
"io.circe" %% "circe-generic" % circeVersion,
"io.circe" %% "circe-parser" % circeVersion
)
// https://mvnrepository.com/artifact/mysql/mysql-connector-java
libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.19"
// https://mvnrepository.com/artifact/org.postgresql/postgresql
libraryDependencies += "org.postgresql" % "postgresql" % "42.2.5"
//https://github.com/lightbend/config
libraryDependencies += "com.typesafe" % "config" % "1.4.0"
// https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "2.3.0"

Jackson version is too old

I have the following build.sbt file:
name := "myProject"
version := "1.0"
scalaVersion := "2.11.8"
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.1"
)
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-hive" % "2.0.0" % "provided",
"com.databricks" %% "spark-csv" % "1.4.0",
"org.scalactic" %% "scalactic" % "2.2.1",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"org.scalacheck" %% "scalacheck" % "1.12.4",
"com.holdenkarau" %% "spark-testing-base" % "2.0.0_0.4.4" % "test",
)
However, when I am running the code, I get this error:
An exception or error caused a run to abort.
java.lang.ExceptionInInitializerError
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.4.4
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:56)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:549)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 58 more
Why is this the case?
I've added a newer version of Jackson to dependencyOverrides(after looking here Spark Parallelize? (Could not find creator property with name 'id')), so an older version shouldn't be used.
jackson-core and jackson-databind versions should match (at least up to the minor version, I believe).
So remove the dependencyOverrides and have
libraryDependencies ++= Seq(
...
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.1"
)
Or specify both in dependencyOverrides
dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.1"
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.1"
)
Though I'm not sure I understand what you are trying to do; the linked question seems to say that you should used an older version (2.4.4).

Spray microservice assembly deduplicate

I'm using this template to develop a microservice:
http://www.typesafe.com/activator/template/activator-service-container-tutorial
My sbt file is like this:
import sbt._
import Keys._
name := "activator-service-container-tutorial"
version := "1.0.1"
scalaVersion := "2.11.6"
crossScalaVersions := Seq("2.10.5", "2.11.6")
resolvers += "Scalaz Bintray Repo" at "https://dl.bintray.com/scalaz/releases"
libraryDependencies ++= {
val containerVersion = "1.0.1"
val configVersion = "1.2.1"
val akkaVersion = "2.3.9"
val liftVersion = "2.6.2"
val sprayVersion = "1.3.3"
Seq(
"com.github.vonnagy" %% "service-container" % containerVersion,
"com.github.vonnagy" %% "service-container-metrics-reporting" % containerVersion,
"com.typesafe" % "config" % configVersion,
"com.typesafe.akka" %% "akka-actor" % akkaVersion exclude ("org.scala-lang" , "scala-library"),
"com.typesafe.akka" %% "akka-slf4j" % akkaVersion exclude ("org.slf4j", "slf4j-api") exclude ("org.scala-lang" , "scala-library"),
"ch.qos.logback" % "logback-classic" % "1.1.3",
"io.spray" %% "spray-can" % sprayVersion,
"io.spray" %% "spray-routing" % sprayVersion,
"net.liftweb" %% "lift-json" % liftVersion,
"com.typesafe.akka" %% "akka-testkit" % akkaVersion % "test",
"io.spray" %% "spray-testkit" % sprayVersion % "test",
"junit" % "junit" % "4.12" % "test",
"org.scalaz.stream" %% "scalaz-stream" % "0.7a" % "test",
"org.specs2" %% "specs2-core" % "3.5" % "test",
"org.specs2" %% "specs2-mock" % "3.5" % "test",
"com.twitter" %% "finagle-http" % "6.25.0",
"com.twitter" %% "bijection-util" % "0.7.2"
)
}
scalacOptions ++= Seq(
"-unchecked",
"-deprecation",
"-Xlint",
"-Ywarn-dead-code",
"-language:_",
"-target:jvm-1.7",
"-encoding", "UTF-8"
)
crossPaths := false
parallelExecution in Test := false
assemblyJarName in assembly := "santo.jar"
mainClass in assembly := Some("Service")
The project compiles fine!
But when I run assembly, the terminal show me this:
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /path/.ivy2/cache/io.dropwizard.metrics/metrics-core/bundles/metrics-core-3.1.1.jar:com/codahale/metrics/ConsoleReporter$1.class
[error] /path/.ivy2/cache/com.codahale.metrics/metrics-core/bundles/metrics-core-3.0.1.jar:com/codahale/metrics/ConsoleReporter$1.class
What options do I have to fix it?
Thanks
The issue as it seems transitive dependency of the dependency is resulting with two different versions of metrics-core. The best thing to do would be to used the right library dependency so that you end up with a single version of this library. Please use https://github.com/jrudolph/sbt-dependency-graph , if it is difficult to figure out dependencies.
If it is not possible to get to a single version then you would most likely to go down exclude route . I assume, this only work, if there is compatibility between the all required versions.