How to prevent SBT to include test dependencies into the POM - scala

I have a small utilities scala build with test classes under a dedicated test folder. Compiling and then publish-local creates the package in my local repository.
As expected, the test folder is automatically excluded from the local jar of the utilities package.
However, the resulting POM still contains the related dependencies as defined in the sbt. The SBT dependencies:
libraryDependencies ++= Seq(
"org.scalactic" %% "scalactic" % "3.0.0" % Test,
"org.scalatest" %% "scalatest" % "3.0.0" % Test
)
The segment of the POM:
<dependency>
<groupId>org.scalactic</groupId>
<artifactId>scalactic_2.11</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.11</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>
The scope clearly needs to be test in order to prevent issues in another project (main) that uses this library. In particular, the testing of the main project otherwise includes these test libraries, which causes version conflicts etc.
As these dependencies are only for the not included test package, having them listed in the POM seems silly. How do I tell SBT to not include these test scope dependencies into the final POM?

There was a similar question asked here: sbt - exclude certain dependency only during publish.
Riffing on the answer provided by lyomi, here's how you can exclude all <dependency> elements that contains a child <scope> element, including test and provided.
import scala.xml.{Node => XmlNode, NodeSeq => XmlNodeSeq, _}
import scala.xml.transform.{RewriteRule, RuleTransformer}
// skip dependency elements with a scope
pomPostProcess := { (node: XmlNode) =>
new RuleTransformer(new RewriteRule {
override def transform(node: XmlNode): XmlNodeSeq = node match {
case e: Elem if e.label == "dependency"
&& e.child.exists(child => child.label == "scope") =>
def txt(label: String): String = "\"" + e.child.filter(_.label == label).flatMap(_.text).mkString + "\""
Comment(s""" scoped dependency ${txt("groupId")} % ${txt("artifactId")} % ${txt("version")} % ${txt("scope")} has been omitted """)
case _ => node
}
}).transform(node).head
}
This should generate a POM that looks like this:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.5</version>
</dependency>
<!-- scoped dependency "org.scalatest" % "scalatest_2.12" % "3.0.5" % "test" has been omitted -->
</dependencies>

Related

Incompatible Jackson version: 2.10.2 on trying to create Rdd of Row in spark

I tried to create RDD using sc.parallelize. It gave an exception as :
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:718)
at df_avro.SampleDf$.main(SampleDf.scala:25)
at df_avro.SampleDf.main(SampleDf.scala)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.10.2
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:808)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 4 more
This was the code :
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("TheApp")
val spark = SparkSession.builder()
.config(conf)
.getOrCreate()
val sc = spark.sparkContext
val rowArray: Array[Row] = Array(
Row(1,"hello", true),
Row(2,"goodbye", false)
)
val rows: RDD[Row] = sc.parallelize(rowArray)
println(rows.count())
Why is this causing an exception. Am I missing something ?
Dependencies used :
val spark2Version ="2.2.1"
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % spark2Version,
"org.apache.spark" %% "spark-sql" % spark2Version,
"org.apache.spark" %% "spark-streaming" % spark2Version )
Can add below dependency overrides inside your build.sbt & try again.
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.6.5"
u can go to
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
and see
def setupModule(context: SetupContext) {
val MajorVersion = version.getMajorVersion
val MinorVersion = version.getMinorVersion
context.getMapperVersion match {
case version#VersionExtractor(MajorVersion, minor) if minor < MinorVersion =>
throw new JsonMappingException("Jackson version is too old " + version)
case version#VersionExtractor(MajorVersion, minor) =>
// Under semantic versioning, this check would not be needed; however Jackson
// occasionally has functionally breaking changes across minor versions
// (2.4 -> 2.5 as an example). This may be the fault of the Scala module
// depending on implementation details, so for now we'll just declare ourselves
// as incompatible and move on.
if (minor > MinorVersion) {
throw new JsonMappingException("Incompatible Jackson version: " + version)
}
case version =>
throw new JsonMappingException("Incompatible Jackson version: " + version)
}
so it's clearly;u use a higher version than this;u can see the jar's version;
So changing your dependency's version can solve it;
I use 2.11 scala,2.4.1 spark,3.1.4 hadoop;
Then I use
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version> 2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version> 2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version> 2.6.7</version>
</dependency>

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean

I'm trying to read properties file and got stuck with error which is given below.I have written an Scala package where i'm trying to read properties file and call into abc.scala program.Any help will be appreciated.
File:- xyz.properties
driver = "oracle.jdbc.driver.OracleDriver"
url = "jdbc:oracle:thin:#xxxx:1521/xxxx.xxxx"
username = "xxx"
password = "xxx"
input_file = "C:\\Users\\xxx\\test\\src\\main\\resources\\xxxx.xlsx"
build.sbt
name := "xxx.xxxx.xxxxx"
scalaVersion := "2.10.6"
ivyScala := ivyScala.value map{ _.copy(overrideScalaVersion = true) }
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"com.databricks" %% "spark-csv" % "1.5.0",
"org.apache.commons" % "commons-configuration2" % "2.1.1",
"commons-beanutils" % "commons-beanutils" % "1.9.3",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.scala-lang" % "scala-xml" % "2.11.0-M4" )
Package
package com.xxx.zzzz.xxx1
import java.io.File
import org.apache.commons.configuration2.builder.fluent.{Configurations, Parameters}
object Configuration {
var config = new Configurations()
var configs = config.properties(new File("xyz.properties"))
var inputFile = configs.getString("input")
var userName = configs.getString("user_name")
var password = configs.getString("passwd")
var driver = configs.getString("driver")
var url = configs.getString("Url")
}
Main Program abc.scala
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import package com.xxx.zzzz.xxx1.Configurations
import org.apache.commons.beanutils.PropertyUtils
object ItalyPanelData {
def main(args: Array[String]): Unit = {
//Logger.getRootLogger().setLevel(Level.OFF)
println("Inside main program"+ Configuration.driver)
//Set the properties for spark to connect the oracle database
val dbProp = new java.util.Properties
dbProp.setProperty("driver", Configuration.driver)
dbProp.setProperty("user", Configuration.userName)
dbProp.setProperty("password", Configuration.password)
//Create a connection to connect spark
val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
//exception handlying
try {
//Create dataframe boject
val df = sqlContext.read
.option("location", Configuration.inputFile) //Initiating input path
.option("sheetName", "xyz") //Give the SheetName
.option("useHeader", "true") //It takes the header name from excel sheet
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns", "false")
.load()
// Write into oracale database
df.write.mode("append").jdbc(Configuration.url, "xyz", dbProp)
}
catch {
case e: Throwable => e.printStackTrace();
}
}
}
Error
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V
at org.apache.commons.configuration2.beanutils.BeanHelper.initBeanUtilsBean(BeanHelper.java:631)
at org.apache.commons.configuration2.beanutils.BeanHelper.<clinit>(BeanHelper.java:89)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.sun.proxy.$Proxy0.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739)
at org.apache.commons.configuration2.builder.fluent.Parameters.createParametersProxy(Parameters.java:294)
at org.apache.commons.configuration2.builder.fluent.Parameters.fileBased(Parameters.java:185)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileParams(Configurations.java:602)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileParams(Configurations.java:614)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileBasedBuilder(Configurations.java:132)
at org.apache.commons.configuration2.builder.fluent.Configurations.propertiesBuilder(Configurations.java:238)
at org.apache.commons.configuration2.builder.fluent.Configurations.properties(Configurations.java:282)
at com.rxcorp.italy.config.Configuration$.<init>(Configuration.scala:8)
at com.rxcorp.italy.config.Configuration$.<clinit>(Configuration.scala)
at com.rxcorp.paneldataloading.ItalyPanelData$.main(abc.scala:12)
Such exceptions are an indication of a version incompatibility.
Meaning: the code that you have written (or more likely: the one of the libraries under the surface) wants to call a method
org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(BeanIntrospector[]);
but the thing is: at runtime, the class file for PropertyUtilsBean does not contain that method.
Thus: you have to step back and understand the components in your stack, and check out their version requirements on the Apache commons library.
And you get more ideas when looking into the javadoc for that method; as it says: Since: 1.9 there.
In other words: this method was added Apache commons 1.9; so some piece of your stack expects at least that version of commons; but your classpath in the JVM that executes the whole thing ... has an older version.
So: check the classpath for apache commons; and most likely you are good by simply updating to a newer version of apache commons. (and yes, maybe that will mean more "hard" debug work; as at least your build settings include a newer version of apache commons).
I guess I have a similar problem. Apache commons configuration 2.7 is used in our project together with apache commons BeanUtils 1.9.
Unfortunately another library we use is jxls-reader 2.0.+ and this one references commons-digester3 library.
So the beanutils 1.9 as well as the commons-digester3 lib both have a class packaged org.apache.commons.beanutils.PropertyUtilsBean. But commons-digester3's version does not have the above mentioned method bringing us to the same dilemma as you have.
For now we can be lucky as our windows servers loading the "correct" version of beanutils first whereas some developers using a mac have it the other way round where the digester3 package is loaded first bringing up the no-such-method-error you have.
Not sure what can be our workaround here.
Anyway check if you have the class two times on your classpath and find out who's using it by checking all your pom.xmls of dependent libs on the classpath. Finally you might be lucky to remove some library if its not needed by your code (chances are low :-( though)
Update 10thNov: I exluded the commons-digester3 from the jxls-reader dependency:
<dependency>
<groupId>org.jxls</groupId>
<artifactId>jxls-reader</artifactId>
<version>2.0.3</version>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-digester3</artifactId>
</exclusion>
</exclusions>
</dependency>
So that the commons-digester with classifier "with-deps" from jxls-reader won't get resolved and I pull it in explicitely in our pom.xml but only the normal jar without packaged classes of commons-logging, commons-beanutils...

Object streaming is not a member of package org.apache.spark

I'm trying to compile a simple scala program and I'm using StreamingContext , here is a snippet of my code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerStageCompleted
import org.apache.spark.streaming.StreamingContext._ //error:object streaming is not a member of package org.apache.spark
object FileCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val ssc = new StreamingContext(sc, Seconds(10)) //error : not found: type StreamingContext
sc.stop()
}
}
I have these two error:
object streaming is not a member of package org.apache.spark
and
not found: type StreamingContext
any help please !!
If you are using sbt, add the following library dependencies:
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" % "provided"
If you are using maven, add the below to pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.0</version>
<scope>provided</scope>
</dependency>
You'll need to add the dependency of spark-streaming into your build manager.
You need to add the correct dependency corresponds to your import statement. And hope obviously you have import the spark-streaming dependencies. except that we need this dependency also.
Here are the dependencies based on your dependency management tool.
For Maven : Add following to pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.0</version>
</dependency>
For SBT : Add following to build.sbt
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0" % "provided"
For Gradle
provided group: 'org.apache.spark', name: 'spark-mllib_2.11', version: '2.1.0'
TIP : use grepcode.com to find the appropriate dependency by searching your import statement. It is a nite site!
NOTE : dependency versions can be changed & updated with the time.
I have added missing dependency after that its work for me, which are
"org.apache.spark" %% "spark-mllib" % SparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.0.1"

Play 2.0 - Build.scala - converting from pom.xml

I'm trying to migrate from a Nexus maven repo to using https://github.com/jcaddel/maven-s3-wagon. Getting on the wagon? I've read some things about build scripts for SBT, but that doesn't seem like what I want...am I missing something? Documentation is sparse.
Here is my Play! 2.0 Build.scala file:
import sbt._
import Keys._
import PlayProject._
object ApplicationBuild extends Build {
val appName = "my-play-app"
val appVersion = "1.0-SNAPSHOT"
val appDependencies = Seq(
"org.fusesource.mqtt-client" % "mqtt-client" % "1.0")
val main = PlayProject(appName, appVersion, appDependencies, mainLang = SCALA).settings(
resolvers ++= Seq(
"Maven Repository" at "http://repo1.maven.org/maven2/",
"fusesource.snapshots" at "http://repo.fusesource.com/nexus/content/repositories/snapshots",
"fusesource.releases" at "http://repo.fusesource.com/nexus/content/groups/public"))
}
Here is what I need to convert from the pom.xml file to Build.scala (via the wagon wiki):
<build>
<extensions>
<extension>
<groupId>org.kuali.maven.wagons</groupId>
<artifactId>maven-s3-wagon</artifactId>
<version>[S3 Wagon Version]</version>
</extension>
</extensions>
</build>
And
<distributionManagement>
<site>
<id>s3.site</id>
<url>s3://[AWS Bucket Name]/site</url>
</site>
<repository>
<id>s3.release</id>
<url>s3://[AWS Bucket Name]/release</url>
</repository>
<snapshotRepository>
<id>s3.snapshot</id>
<url>s3://[AWS Bucket Name]/snapshot</url>
</snapshotRepository>
</distributionManagement>
I think I understand how to add the distribution portion to Build.scala:
import sbt._
import Keys._
import PlayProject._
object ApplicationBuild extends Build {
val appName = "my-play-app"
val appVersion = "1.0-SNAPSHOT"
val appDependencies = Seq(
"org.fusesource.mqtt-client" % "mqtt-client" % "1.0")
val main = PlayProject(appName, appVersion, appDependencies, mainLang = SCALA).settings(
resolvers ++= Seq(
"Maven Repository" at "http://repo1.maven.org/maven2/",
"fusesource.snapshots" at "http://repo.fusesource.com/nexus/content/repositories/snapshots",
"fusesource.releases" at "http://repo.fusesource.com/nexus/content/groups/public",
"s3.site" at "s3://[AWS Bucket Name]/site",
"s3.release" at "s3://[AWS Bucket Name]/release",
"s3.snapshot" at "s3://[AWS Bucket Name]/snapshot"))
}
It looks like there is still no automatic S3 publish support in sbt (although there is a s3-plugin). But I think you can easily create your own, given that
sbt can be enhanced with plugins
Maven plugins are just POJOs (so you can easily reuse them outside of maven)
There is an existing Maven plugin that already does what you want
I think you can
Use the sbt release plugin...
...add your own custom release step...
...that calls S3Wagon.putResource or S3Plugin.S3.s3Settings.upload
A combination of sbt-aether-deploy, maven-s3-wagon and fmt-sbt-s3-resolver works well for me
build.sbt:
publishMavenStyle := true
publishTo <<= version { v: String =>
if (v.trim.endsWith("SNAPSHOT"))
Some("Snapshots" at "s3://myrepo/snapshots")
else
Some("Releases" at "s3://myrepo/releases")
}
aetherSettings
aetherPublishSettings
wagons := Seq(aether.WagonWrapper("s3", "org.kuali.maven.wagon.S3Wagon"))
plugins.sbt:
addSbtPlugin("no.arktekk.sbt" % "aether-deploy" % "0.13")
addSbtPlugin("com.frugalmechanic" % "fm-sbt-s3-resolver" % "0.4.0")
libraryDependencies += "org.kuali.maven.wagons" % "maven-s3-wagon" % "1.2.1"
fm-sbt-s3-resolver is used for resolving s3 dependencies and aether for deploy. Deployment with fm-sbt-s3-resolver alone will not AFAIK generate and publish metadata (maven-metadata.xml)
Sbt doesn't support maven extensions which is what gives you the s3:// protocol, so in short there is no easy way to do what you are trying to do
There is an S3 Plugin for sbt available.

What does "str" % "str" mean in SBT?

I came across this code:
import sbt._
class AProject(info: ProjectInfo) extends DefaultProject(info) {
val scalaToolsSnapshots = ScalaToolsSnapshots
val scalatest = "org.scalatest" % "scalatest" %
"1.0.1-for-scala-2.8.0.RC1-SNAPSHOT"
}
And I'm quite confused as to what scalatest contains, and what the % does.
It declares a dependency. In particular,
val scalatest = "org.scalatest" % "scalatest" % "1.0.1-for-scala-2.8.0.RC1-SNAPSHOT
refers to a dependency which can be found at
http://scala-tools.org/repo-snapshots/org/scalatest/scalatest/1.0.1-for-scala-2.8.0.RC1-SNAPSHOT/
Where everything before org refers to the repository, which is (pre-)defined elsewhere.
It is not easy to find the implicit that enables % on String, but, for the record, it is found on ManagedProject, converting a String into a GroupID. In the same trait there's also another implicit which enables the at method.
At any rate, the implicit will turn the first String into a GroupID, the first % will take a String representing the artifact ID and return a GroupArtifactID, and the second will take a String representing the revision and return a ModuleID, which is what finally gets assigned to scalatest.
If you used Maven this is essentially the same thing but with Scala DSL. % works as a separator:
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest</artifactId>
<version>1.0.1-for-scala-2.8.0.RC1-SNAPSHOT</version>
</dependency>
Read more:
http://code.google.com/p/simple-build-tool/wiki/LibraryManagement