sbt assembly with Junit test fail - scala

I am very new to scala and sbt.
I wanted to run Junit tests with sbt assembly.
I designed all my test an all run correctly with IntelliJ.
When i try to build with tests, it always fails giving lots of errors.
Here is my build.sbt
name := "updater"
version := "0.1-SNAPSHOT"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
//"org.scala-lang" % "scala-reflect" % "2.11.12",
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"com.typesafe" % "config" % "1.3.4",
//Testing
"junit" % "junit" % "4.10" % Test,
"com.novocode" % "junit-interface" % "0.11" % Test
// exclude("junit", "junit-dep")
,
//"org.scalatest" %% "scalatest" % "3.0.7" % Test,
"org.easymock" % "easymock" % "4.0.2" % Test,
//Logging
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0"
)
assemblyMergeStrategy in assembly := {
case PathList("src/test/resources/library.properties", xs#_*) => MergeStrategy.discard
case PathList("META-INF", xs#_*) => MergeStrategy.discard
case x => MergeStrategy.first
}
I attach you the log file as the problem, to me, as a newby seems not understandable. It is driving me crazy.
This is my abstract Test class which is supposed to initialize a spark context with #BeforeClass in every test class. I only included this because I suspect it could be the cause of the failure.
Do you have any suggestions on how to solve it?
Thanks

I was instanciating a class like so:
abstract class SparkTest {
val spark: SparkSession = SparkTest.spark
}
object SparkTest {
var spark: SparkSession = _
#BeforeClass
def initializeSpark(): Unit = {
spark = SparkSession
.builder()
.appName("TableUpdaterTest")
.master("local")
.getOrCreate()
}
#AfterClass
def stopSpark(): Unit = {
spark.stop()
}
}
Apparently by commenting the spark.stop() everything started to work.
Anyone has an Idea on why?

Related

Import sbt and typesafe in build (IntelliJ)

i can't import the sbt and typesafe libraries into build.sbt in IntelliJ.
The dependencies of sbt and typesafe are in the plugin.sbt file. In the file plugin.sbt also I have the addSbtPlugin method in red:
plugin.sbt
while the import of the libraries are inside the build.sbt file.
build.sbt
My
How can I do?
Update
The build.sbt file is this:
import com.typesafe.sbt.license.{DepModuleInfo, LicenseCategory, LicenseInfo}
import sbt._
import scala.io.Source
// Core library versions (the ones that are used multiple times)
val sparkVersion: String = "2.3.1"
val slf4jVersion: String = "1.7.25"
val logbackVersion: String = "1.2.3"
// Artifactory settings
val artifactoryRealm: String = "artifactory-espoo1.int.net.nokia.com"
val artifactoryUrl: String = s"https://$artifactoryRealm/artifactory/"
val artifactoryUser: Option[String] = sys.env.get("ARTIFACTORY_USER")
val artifactoryPassword: Option[String] = sys.env.get("ARTIFACTORY_PASSWORD")
// Project variables
val organizationId: String = "com.nokia.gs.npo.ae"
val rootPackage: String = organizationId + ".rfco"
// Base settings shared across modules
val baseSettings: Seq[SettingsDefinition] = Seq(
organization := organizationId,
version := Source.fromFile(file("VERSION")).mkString.trim + sys.env.getOrElse("VERSION_TAG", ""),
scalaVersion := "2.11.12",
buildInfoUsePackageAsPath := true,
scalafmtOnCompile in ThisBuild := false, // just invoke `sbt scalafmt` before commits!
parallelExecution in ThisBuild := false,
fork in Test := true,
testForkedParallel in Test := true,
logLevel in test := util.Level.Info,
coverageMinimum := sys.env.getOrElse("COVERAGE_MINIMUM", "80.0").toDouble,
coverageFailOnMinimum := true,
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.apache.spark" %% "spark-hive" % sparkVersion % Provided,
"org.slf4j" % "slf4j-api" % slf4jVersion % Compile,
"com.nokia.gs.ncs.chubs.common" %% "spark-commons" % "0.5.10" % Compile,
"com.nokia.gs.ncs.chubs.common" %% "lang" % "0.2.0" % Compile,
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.8" % Compile,
"com.typesafe.play" %% "play-json" % "2.7.1" % Compile,
"org.apache.commons" % "commons-csv" % "1.7" % Compile,
"org.scalatest" %% "scalatest" % "3.0.5" % Test,
"ch.qos.logback" % "logback-classic" % logbackVersion % Test,
"ch.qos.logback" % "logback-core" % logbackVersion % Test,
"org.apache.spark" %% "spark-hive-thriftserver" % sparkVersion % Test,
"com.github.tomakehurst" % "wiremock-standalone" % "2.22.0" % Test
),
excludeDependencies ++= Seq(
"com.fasterxml.jackson.module" % "jackson-module-scala",
"org.slf4j" % "slf4j-log4j12",
"org.hamcrest" % "hamcrest-core",
"javax.servlet" % "servlet-api"
),
publishTo := {
Some("Artifactory Realm" at artifactoryUrl + sys.env.getOrElse("ARTIFACTORY_LOCATION", "ava-maven-snapshots-local"))
},
packagedArtifacts in publish ~= { m =>
val classifiersToExclude = Set(Artifact.SourceClassifier)
m.filter({ case (art, _) => art.classifier.forall(c => !classifiersToExclude.contains(c)) })
},
(artifactoryUser, artifactoryPassword) match {
case (Some(user), Some(password)) =>
credentials += Credentials("Artifactory Realm", artifactoryRealm, user, password)
case _ =>
println("[info] USERNAME and/or PASSWORD is missing for publishing to Artifactory")
credentials := Seq()
}
)
By looking at the build.sbt your plugins.sbt should at least contain these lines:
addSbtPlugin("com.typesafe.sbt" % "sbt-license-report" % "1.2.0")
addSbtPlugin("com.eed3si9n" % "sbt-buildinfo" % "0.9.0")
addSbtPlugin("org.scalameta" % "sbt-scalafmt" % "2.2.1")
addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.6.1")
I solved import sbt, creating a new project on IntelliJ IDea and importing the source files into the project I created. Oddly now it does the import. Before, however, I opened the source folder with the code from IntelliJ, but not made the import of sbt.
build.sbt
I only have to resolve type safe dependencies, but it is a problem with external and private dependencies.
Thanks for your help

ScalaSpark: unable to create dataframe with scala-client dependency

I need to support couchbase version 6 with spark 2.3 or 2.4 and scala version is 2.11.12. I am facing an issue while creating a data frame.
SBT code snippet
scalaVersion := "2.11.12"
resolvers += "Couchbase Snapshots" at "http://files.couchbase.com/maven2"
val sparkVersion = "2.3.2"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"com.couchbase.client" %% "spark-connector" % "2.3.0",
"com.couchbase.client" %% "scala-client" % "1.0.0-alpha.3")
Code
val spark = SparkSession
.builder()
.appName("Example")
.master("local[*]")
.config("spark.couchbase.nodes", "10.12.12.88") // connect to Couchbase Server on localhost
.config("spark.couchbase.username", "abcd") // with given credentials
.config("spark.couchbase.password", "abcd")
.config("spark.couchbase.bucket.beer-sample", "") // open the travel-sample bucket
.getOrCreate()
val sc = spark.sparkContext
import com.couchbase.spark.sql._
val sql = spark.sqlContext
val dataframe = sql.read.couchbase()
val result = dataframe.collect()
Exception
Caused by: java.lang.ClassNotFoundException: com.couchbase.client.core.message.CouchbaseRequest
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
Tried:
As per the suggestion added a dependency
"com.couchbase.client" % "core-io" % "1.7.6",
Without scala-client dependency I am able to get dataframe but with scala-client unable to fix. please suggest a solution for this problem
I have made the changes to your build.sbt file and have added settings for the sbt-assembly plugin.
scalaVersion := "2.11.12"
resolvers += "Couchbase Snapshots" at "http://files.couchbase.com/maven2"
val sparkVersion = "2.3.2"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-streaming" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"com.couchbase.client" %% "spark-connector" % "2.3.0")
assemblyJarName in assembly := s"${name.value}-${version.value}.jar"
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "reference.conf" => MergeStrategy.concat
case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
case _ => MergeStrategy.first
}
You need to create a file called plugins.sbt in the directory called project and add the following line to it:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
Once done, run the command sbt clean compile assembly in the projects' root directory. it should build your jar.

Build sbt for spark with janusgraph and gremlin scala

I was trying to setup a IntelliJ build for spark with janusgraph using gremlin scala but I am running into errors.
My build.sbt file is:
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "com.michaelpollmeier" % "gremlin-scala" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.1"
// https://mvnrepository.com/artifact/org.janusgraph/janusgraph-core
libraryDependencies += "org.janusgraph" % "janusgraph-core" % "0.2.0"
libraryDependencies ++= Seq(
"ch.qos.logback" % "logback-classic" % "1.2.3" % Test,
"org.scalatest" %% "scalatest" % "3.0.3" % Test
)
resolvers ++= Seq(
Resolver.mavenLocal,
"Sonatype OSS" at "https://oss.sonatype.org/content/repositories/public"
)
But I am getting errors when I try to compile code that uses gremlin scala libraries or io.Source libraries. Can someone share their build file or tell what I should modify to fix it.
Thanks in advance.
So, I was trying to compile this code:
import gremlin.scala._
import org.apache.commons.configuration.BaseConfiguration
import org.janusgraph.core.JanusGraphFactory
class Test1() {
val conf = new BaseConfiguration()
conf.setProperty("storage.backend", "inmemory")
val gr = JanusGraphFactory.open(conf)
val graph = gr.asScala()
graph.close
}
object Test{
def main(args: Array[String]) {
val t = new Test1()
println("in Main")
}
}
The errors I get are:
Error:(1, 8) not found: object gremlin
import gremlin.scala._
Error:(10, 18) value asScala is not a member of org.janusgraph.core.JanusGraph
val graph = gr.asScala()
If you go to the Gremlin-Scala GitHub page you'll see that the current version is "3.3.1.1" and that
Typically you just need to add a dependency on "com.michaelpollmeier" %% "gremlin-scala" % "SOME_VERSION" and one for the graph db of your choice to your build.sbt (this readme assumes tinkergraph). The latest version is displayed at the top of this readme in the maven badge.
It is not a surprise that the APi has changed when the major version of the
library is different. If I change your first dependency as
//libraryDependencies += "com.michaelpollmeier" % "gremlin-scala" % "2.3.0" //old!
libraryDependencies += "com.michaelpollmeier" %% "gremlin-scala" % "3.3.1.1"
then your example code compiles for me.

Why does sbt assembly in Spark project fail with "Please add any Spark dependencies by supplying the sparkVersion and sparkComponents"?

I work on a sbt-managed Spark project with spark-cloudant dependency. The code is available on GitHub (on spark-cloudant-compile-issue branch).
I've added the following line to build.sbt:
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
And so build.sbt looks as follows:
name := "Movie Rating"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= {
val sparkVersion = "1.6.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
"org.apache.kafka" % "kafka-clients" % "0.9.0.0",
"org.apache.kafka" %% "kafka" % "0.9.0.0",
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
)
}
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", "spark", xs # _*) => MergeStrategy.first
case PathList("scala", xs # _*) => MergeStrategy.discard
case PathList("META-INF", "maven", "org.slf4j", xs # _* ) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
unmanagedBase <<= baseDirectory { base => base / "lib" }
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
When I execute sbt assembly I get the following error:
java.lang.RuntimeException: Please add any Spark dependencies by
supplying the sparkVersion and sparkComponents. Please remove:
org.apache.spark:spark-core:1.6.0:provided
Probably related: https://github.com/databricks/spark-csv/issues/150
Can you try adding spIgnoreProvided := true to your build.sbt?
(This might not be the answer and I could have just posted a comment but I don't have enough reputation)
NOTE I still can't reproduce the issue, but think it does not really matter.
java.lang.RuntimeException: Please add any Spark dependencies by supplying the sparkVersion and sparkComponents.
In your case, your build.sbt misses a sbt resolver to find spark-cloudant dependency. You should add the following line to build.sbt:
resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
PROTIP I strongly recommend using spark-shell first and only when you're comfortable with the package switch to sbt (esp. if you're new to sbt and perhaps other libraries/dependencies too). It's too much to digest in one bite. Follow https://spark-packages.org/package/cloudant-labs/spark-cloudant.

assemblyMergeStrategy causing scala.MatchError when compiling

I'm new to sbt/assembly. I'm trying to resolve some dependency problems, and it seems the only way to do it is through a custom merge strategy. However, whenever I try to add a merge strategy I get a seemingly random MatchError on compiling:
[error] (*:assembly) scala.MatchError: org/apache/spark/streaming/kafka/KafkaUtilsPythonHelper$$anonfun$13.class (of class java.lang.String)
I'm showing this match error for the kafka library, but if I take out that library altogether, I get a MatchError on another library. If I take out all the libraries, I get a MatchError on my own code. None of this happens if I take out the "assemblyMergeStrategy" block. I'm clearly missing something incredibly basic, but for the life of me I can't find it and I can't find anyone else that has this problem. I've tried the older mergeStrategy syntax, but as far as I can read from the docs and SO, this is the proper way to write it now. Please help?
Here is my project/assembly.sbt:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
And my project.sbt file:
name := "Clerk"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.6.1" % "provided",
"org.apache.kafka" %% "kafka" % "0.8.2.1",
"ch.qos.logback" % "logback-classic" % "1.1.7",
"net.logstash.logback" % "logstash-logback-encoder" % "4.6",
"com.typesafe.scala-logging" %% "scala-logging" % "3.1.0",
"org.apache.spark" %% "spark-streaming-kafka" % "1.6.1",
("org.apache.spark" %% "spark-streaming-kafka" % "1.6.1").
exclude("org.spark-project.spark", "unused")
)
assemblyMergeStrategy in assembly := {
case PathList("org.slf4j", "impl", xs # _*) => MergeStrategy.first
}
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
You're missing a default case for your merge strategy pattern match:
assemblyMergeStrategy in assembly := {
case PathList("org.slf4j", "impl", xs # _*) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}