How do I exclude/include specific packages using xsbt-proguard-plugin? - scala

I'm using xsbt-proguard-plugin, which is an SBT plugin for working with Proguard.
I'm trying to come up with a Proguard configuration for a Hive Deserializer I've written, which has the following dependencies:
// project/Dependencies.scala
val hadoop = "org.apache.hadoop" % "hadoop-core" % V.hadoop
val hive = "org.apache.hive" % "hive-common" % V.hive
val serde = "org.apache.hive" % "hive-serde" % V.hive
val httpClient = "org.apache.httpcomponents" % "httpclient" % V.http
val logging = "commons-logging" % "commons-logging" % V.logging
val specs2 = "org.specs2" %% "specs2" % V.specs2 % "test"
Plus an unmanaged dependency:
// lib/UserAgentUtils-1.6.jar
Because most of these are either for local unit testing or are available within a Hadoop/Hive environment anyway, I want my minified jarfile to only include:
The Java classes SnowPlowEventDeserializer.class and SnowPlowEventStruct.class
org.apache.httpcomponents.httpclient
commons-logging
lib/UserAgentUtils-1.6.jar
But I'm really struggling to get the syntax right. Should I start from a whitelist of classes I want to keep, or explicitly filter out the Hadoop/Hive/Serde/Specs2 libraries? I'm aware of this SO question but it doesn't seem to apply here.
If I initially try the whitelist approach:
// Should be equivalent to sbt> package
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventDeserializer",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventStruct"
)
)
Then I get a Hadoop processing error, so clearly Proguard is still trying to bundle Hadoop:
proguard: java.lang.IllegalArgumentException: Can't find common super class of [[Lorg/apache/hadoop/fs/FileStatus;] and [[Lorg/apache/hadoop/fs/s3/Block;]
Meanwhile if I try Proguard's filtering syntax to build up the blacklist of libraries I don't want to include:
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-injars !*hadoop*.jar"
)
)
Then this doesn't seem to work either:
proguard: java.io.IOException: Can't read [/home/dev/snowplow-log-deserializers/!*hadoop*.jar] (No such file or directory)
Any help greatly appreciated!

The whitelist is the proper approach: ProGuard should get a complete context, so it can properly shake out classes, fields, and methods that are not needed.
The error "Can't find common super class" suggests that some library is still missing from the input. ProGuard probably warned about it, but the configuration appears to contain the option -ignorewarnings or -dontwarn (which should be avoided). You should add the library with -injars or -libraryjars.
If ProGuard then includes some classes that you weren't expecting in the output, you can get an explanation with "-whyareyoukeeping class somepackage.SomeUnexpectedClass".
Starting from a working configuration, you can still try to filter out classes or entire jars from the input. Filters are added to items in a class path though, not on their own, e.g. "-injars some.jar(!somepackage/**.class)" -- cfr. the manual. This can be useful if the input contains test classes that drag in other unwanted classes.

In the end, I couldn't get past duplicate class errors using Proguard, let alone how to figure out how to filter out the relevant jars, so finally switched to a much cleaner sbt-assembly approach:
-1. Added the sbt-assembly plugin to my project as per the README
-2. Updated the appropriate project dependencies with a "provided" flag to stop them being added into my fat jar:
val hadoop = "org.apache.hadoop" % "hadoop-core" % V.hadoop % "provided"
val hive = "org.apache.hive" % "hive-common" % V.hive % "provided"
val serde = "org.apache.hive" % "hive-serde" % V.hive % "provided"
val httpClient = "org.apache.httpcomponents" % "httpclient" % V.http
val httpCore = "org.apache.httpcomponents" % "httpcore" % V.http
val logging = "commons-logging" % "commons-logging" % V.logging % "provided"
val specs2 = "org.specs2" %% "specs2" % V.specs2 % "test"
-3. Added an sbt-assembly configuration like so:
import sbtassembly.Plugin._
import AssemblyKeys._
lazy val sbtAssemblySettings = assemblySettings ++ Seq(
assembleArtifact in packageScala := false,
jarName in assembly <<= (name, version) { (name, version) => name + "-" + version + ".jar" },
mergeStrategy in assembly <<= (mergeStrategy in assembly) {
(old) => {
case "META-INF/NOTICE.txt" => MergeStrategy.discard
case "META-INF/LICENSE.txt" => MergeStrategy.discard
case x => old(x)
}
}
)
Then typing assembly produced a "fat jar" with just the packages I needed in it, including the unmanaged dependency and excluding Hadoop/Hive etc.

Related

java.lang.VerifyError: Operand stack overflow for google-ads API and SBT

I am trying to migrate from Google-AdWords to google-ads-v10 API in spark 3.1.1 in EMR.
I am facing some dependency issues due to conflicts with existing jars.
Initially, we were facing a dependency related to Protobuf jar:
Exception in thread "grpc-default-executor-0" java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class com.google.ads.googleads.v10.services.SearchGoogleAdsRequest
at com.google.ads.googleads.v10.services.SearchGoogleAdsRequest.getSerializedSize(SearchGoogleAdsRequest.java:394)
at io.grpc.protobuf.lite.ProtoInputStream.available(ProtoInputStream.java:108)
In order to resolve this, tried to shade the Protobuf jar and have a uber-jar instead. After the shading, running my project locally in IntelliJ works fine, But when trying to run an executable jar I created I get the following error:
Exception in thread "main" io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
I tried adding all those libraries in --spark.jars.packages but it didn't help.
java.lang.VerifyError: Operand stack overflow
Exception Details:
Location:
io/grpc/internal/TransportTracer.getStats()Lio/grpc/InternalChannelz$TransportStats; ...
...
...
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.<init>(NettyChannelBuilder.java:96)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.forTarget(NettyChannelBuilder.java:169)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.forAddress(NettyChannelBuilder.java:152)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress(NettyChannelProvider.java:38)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress(NettyChannelProvider.java:24)
at io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:348)
Has anyone ever encountered such an issue?
Build.sbt
lazy val dependencies = new {
val sparkRedshift = "io.github.spark-redshift-community" %% "spark-redshift" % "5.0.3" % "provided" excludeAll (ExclusionRule(organization = "com.amazonaws"))
val jsonSimple = "com.googlecode.json-simple" % "json-simple" % "1.1" % "provided"
val googleAdsLib = "com.google.api-ads" % "google-ads" % "17.0.1"
val jedis = "redis.clients" % "jedis" % "3.0.1" % "provided"
val sparkAvro = "org.apache.spark" %% "spark-avro" % sparkVersion % "provided"
val queryBuilder = "com.itfsw" % "QueryBuilder" % "1.0.4" % "provided" excludeAll (ExclusionRule(organization = "com.fasterxml.jackson.core"))
val protobufForGoogleAds = "com.google.protobuf" % "protobuf-java" % "3.18.1"
val guavaForGoogleAds = "com.google.guava" % "guava" % "31.1-jre"
}
libraryDependencies ++= Seq(
dependencies.sparkRedshift, dependencies.jsonSimple, dependencies.googleAdsLib,dependencies.guavaForGoogleAds,dependencies.protobufForGoogleAds
,dependencies.jedis, dependencies.sparkAvro,
dependencies.queryBuilder
)
dependencyOverrides ++= Set(
dependencies.guavaForGoogleAds
)
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.protobuf.**" -> "repackaged.protobuf.#1").inAll
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs#_*) => MergeStrategy.discard
case PathList("module-info.class", xs#_*) => MergeStrategy.discard
case x => MergeStrategy.first
}
I had a similar issue and I changed the assembly merge strategy to this:
assemblyMergeStrategy in assembly := {
case x if x.contains("io.netty.versions.properties") => MergeStrategy.discard
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
Solved this by using the google-ads-shadowjar as an external jar rather than having a dependency on google-ads library. This solves the problem of having to deal with dependencies manually but makes your jar size bigger.

buildt.sbt: exclude dependencies from dependsOn submodule

There is a similar question here but that solution does not work in sbt v1.x
In the build sbt it is well documented how to exclude dependencies when added through libraryDependencies:
libraryDependencies += "log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms")
or preventing transitive dependencies:
libraryDependencies += "org.apache.felix" % "org.apache.felix.framework" % "1.8.0" intransitive()
but my question is how (and if) it can be done when declaring dependsOn dependencies of submodules in a multi-module project like this:
lazy val core = project.dependsOn(util)
How would I do something like this (invalid code in example below) to prevent a transitive dependency from being brought in via util:
lazy val core = project.dependsOn(util exclude("javax.jms", "jms"))
also how, and more importantly, how to exclude a transitive dependency on another submodule in the multi-module project from being brought in via util (where sub3 is another submodule project declared in the same build.sbt):
lazy val core = project.dependsOn(util exclude sub3)
The way to do it, is to use excludeDependencies SettingKey.
An short example:
excludeDependencies ++= Seq(
ExclusionRule("commons-logging", "commons-logging")
)
Source
If you happen to define your dependencies as val (as I do), you might find it useful to define the excludes based on your dependencies. To do so, you need this simple method:
def excl(m: ModuleID): InclExclRule = InclExclRule(m.organization, m.name)
and it allows for easy exclusions:
val theLib = "com.my.lib" % "artifact" % "version"
lazy val `projectA` = (project in file("projectA"))
.settings(
...
libraryDependencies ++= Seq(
theLib
)
)
lazy val `projectB` = (project in file("projectB"))
.settings(
...
libraryDependencies ++= Seq(
...
),
excludeDependencies ++= Seq(
excl(theLib)
)
)
.dependsOn(projectA)

Can't use breeze even after I added dependencies to build.sbt

I was following this tutorial on installing breeze, but I can't get it to work.
My directory structure:
myproject/
build.sbt
project/
Build.scala # This is empty
src/
main/
scala/
hello.scala
test/
scala/
my_tests.scala
My build.sbtlooks like this (It is mostly copied from the tutorial):
name := "My project"
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.0"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.0" % "test"
libraryDependencies ++= Seq(
// other dependencies here
"org.scalanlp" %% "breeze" % "0.12",
// native libraries are not included by default. add this if you want them (as of 0.7)
// native libraries greatly improve performance, but increase jar sizes.
// It also packages various blas implementations, which have licenses that may or may not
// be compatible with the Apache License. No GPL code, as best I know.
"org.scalanlp" %% "breeze-natives" % "0.12",
// the visualization library is distributed separately as well.
// It depends on LGPL code.
"org.scalanlp" %% "breeze-viz" % "0.12"
)
resolvers ++= Seq(
// other resolvers here
// if you want to use snapshot builds (currently 0.12-SNAPSHOT), use this.
"Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/",
"Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"
)
// or 2.11.5
scalaVersion := "2.11.8"
And my hello.scala file looks like this:
package mypackage
import breeze.linalg._
object Hello {
def main(args: Array[String]): Unit = {
println("Hello World")
val x = Dense.Vector.zeros[Double](5)
println(x)
}
}
The error that I get looks like this:
not found: value Dense
[error] val x = Dense.Vector.zeros[Double](5)
[error] ^
^
I know that I adding unit test related libraries to libraryDependencies correctly, because my unit tests work after I add them. But what am I doing wrong when adding dependencies for breeze? What steps should I take to narrow down the problem?

How to override dependency on certain task in sbt

I want to override dependency on project in certain Task.
I have a sbt multi-project which using spark.
lazy val core = // Some Project
val sparkLibs = Seq(
"org.apache.spark" %% "spark-core" % "1.6.1"
)
val sparkLibsProvided = Seq(
"org.apache.spark" %% "spark-core" % "1.6.1" % "provided"
)
lazy val main = Project(
id = "main",
base = file("main-project"),
settings = sharedSettings
).settings(
name := "main",
libraryDependencies ++= sparkLibs,
dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4"
)
).dependsOn(core)
When I try to make fat jar to submit on my yarn cluster, I use https://github.com/sbt/sbt-assembly task. But in this case, I want to use sparkLibsProvided instead of sparkLibs something like:
lazy val sparkProvided = (project in assembly).settings(
dependencyOverrides ++= sparkLibsProvided.toSet
)
How can I properly override this dependency?
You can create a new project which is a dedicated project for creating your spark uber jar with the provided flag:
lazy val sparkUberJar = (project in file("spark-project"))
.settings(sharedSettings: _*)
.settings(
libraryDependencies ++= sparkLibsProvided,
dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4"
)
)
And when you assemble in sbt, go to the said project first:
sbt project sparkUberJar
sbt assembly
This can be easily achieved by using the key provided specifically for what you want:
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter {
_.data.getName == "spark-core-1.6.1.jar"
}
}
This approach is considered hacky, however, and it would be better if you managed to split your configuration into subprojects, as is also warned in official documentation here:
If you need to tell sbt-assembly to ignore JARs, you're probably doing it wrong. assembly task grabs deps JARs from your project's classpath. Try fixing the classpath first.

Play 2.4.1, PlayEbean not found

After updating my Java project from 2.2 to 2.4, I followed the instructions on the Migration page, but am getting that error, saying the value PlayEbean was not found.
What am I doing wrong? As far as I can tell I only have to add that one line to the plugins.sbt file and it should work, right?
EDIT: I tried 2.4.2, exact same problem occured.
For clarity's sake: there is no build.sbt file. Only a Build.scala file and a BuildKeys.scala and BuildPlugin.scala file. Though those last 2 have no relation to this problem.
The files:
project/Build.scala:
import sbt._
import Keys._
import play.sbt.PlayImport._
import PlayKeys._
object BuildSettings {
val appVersion = "0.1"
val buildScalaVersion = "2.11.7"
val buildSettings = Seq (
version := appVersion,
scalaVersion := buildScalaVersion
)
}
object Resolvers {
val typeSafeRepo = "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
val localRepo = "Local Maven Repositor" at "file://"+Path.userHome.absolutePath+"/.m2/repository"
val bintrayRepo = "scalaz-bintray" at "https://dl.bintray.com/scalaz/releases"
val sbtRepo = "Public SBT repo" at "https://dl.bintray.com/sbt/sbt-plugin-releases/"
val myResolvers = Seq (
typeSafeRepo,
localRepo,
bintrayRepo,
sbtRepo
)
}
object Dependencies {
val mindrot = "org.mindrot" % "jbcrypt" % "0.3m"
val libThrift = "org.apache.thrift" % "libthrift" % "0.9.2"
val commonsLang3 = "org.apache.commons" % "commons-lang3" % "3.4"
val commonsExec = "org.apache.commons" % "commons-exec" % "1.3"
val guava = "com.google.guava" % "guava" % "18.0"
val log4j = "org.apache.logging.log4j" % "log4j-core" % "2.3"
val jacksonDataType = "com.fasterxml.jackson.datatype" % "jackson-datatype-joda" % "2.5.3"
val jacksonDataformat = "com.fasterxml.jackson.dataformat" % "jackson-dataformat-xml" % "2.5.3"
val postgresql = "postgresql" % "postgresql" % "9.3-1103.jdbc41"
val myDeps = Seq(
// Part of play
javaCore,
javaJdbc,
javaWs,
cache,
// User defined
mindrot,
libThrift,
commonsLang3,
commonsExec,
guava,
log4j,
jacksonDataType,
jacksonDataformat,
postgresql
)
}
object ApplicationBuild extends Build {
import Resolvers._
import Dependencies._
import BuildSettings._
val appName = "sandbox"
val main = Project(
appName,
file("."),
settings = buildSettings ++ Seq (resolvers := myResolvers, libraryDependencies := myDeps)
)
.enablePlugins(play.PlayJava, PlayEbean)
.settings(jacoco.settings: _*)
.settings(parallelExecution in jacoco.Config := false)
.settings(javaOptions in Test ++= Seq("-Xmx512M"))
.settings(javaOptions in Test ++= Seq("-XX:MaxPermSize=512M"))
}
project/plugins.sbt:
// Use the Play sbt plugin for Play projects
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.4.1")
// The Typesafe repository
resolvers ++= Seq(
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/",
"Local Maven Repositor" at "file://"+Path.userHome.absolutePath+"/.m2/repository",
"scalaz-bintray" at "https://dl.bintray.com/scalaz/releases",
"Public SBT repo" at "https://dl.bintray.com/sbt/sbt-plugin-releases/"
)
libraryDependencies ++= Seq(
"com.puppycrawl.tools" % "checkstyle" % "6.8",
"com.typesafe.play" %% "play-java-ws" % "2.4.1",
"org.jacoco" % "org.jacoco.core" % "0.7.1.201405082137" artifacts(Artifact("org.jacoco.core", "jar", "jar")),
"org.jacoco" % "org.jacoco.report" % "0.7.1.201405082137" artifacts(Artifact("org.jacoco.report", "jar", "jar"))
)
// Plugin for code coverage
addSbtPlugin("de.johoop" % "jacoco4sbt" % "2.1.6")
// Play enhancer - this automatically generates getters/setters for public fields
// and rewrites accessors of these fields to use the getters/setters. Remove this
// plugin if you prefer not to have this feature, or disable on a per project
// basis using disablePlugins(PlayEnhancer) in your build.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-play-enhancer" % "1.1.0")
// Play Ebean support, to enable, uncomment this line, and enable in your build.sbt using
// enablePlugins(SbtEbean). Note, uncommenting this line will automatically bring in
// Play enhancer, regardless of whether the line above is commented out or not.
addSbtPlugin("com.typesafe.sbt" % "sbt-play-ebean" % "1.0.0")
I have tried adding javaEbean to the myDeps variable, output remains the same.
Also, contrary to all the examples and tutorials, if I want to enable PlayJava, I have to do it via play.PlayJava. What is up with that?
For the error: not found: value PlayEbean, you must import play.ebean.sbt.PlayEbean in Build.scala,
Then you will have a not-found error for jacoco, you must import de.johoop.jacoco4sbt.JacocoPlugin.jacoco,
After that a NoClassDefFoundError, there you must upgrade SBT to 0.13.8 in project/build.properties,
Finally the postgresql dependency is incorrect and doesn't resolve.
The SBT part should work, in my case it fail later because I don't have eBeans in project.
Patch version:
diff a/project/Build.scala b/project/Build.scala
--- a/project/Build.scala
+++ b/project/Build.scala
## -1,3 +1,5 ##
+import de.johoop.jacoco4sbt.JacocoPlugin.jacoco
+import play.ebean.sbt.PlayEbean
import play.sbt.PlayImport._
import sbt.Keys._
import sbt._
## -35,7 +37,7 ##
val log4j = "org.apache.logging.log4j" % "log4j-core" % "2.3"
val jacksonDataType = "com.fasterxml.jackson.datatype" % "jackson-datatype-joda" % "2.5.3"
val jacksonDataformat = "com.fasterxml.jackson.dataformat" % "jackson-dataformat-xml" % "2.5.3"
- val postgresql = "postgresql" % "postgresql" % "9.3-1103.jdbc41"
+ val postgresql = "org.postgresql" % "postgresql" % "9.3-1103-jdbc41"
val myDeps = Seq(
// Part of play
diff a/project/build.properties b/project/build.properties
--- a/project/build.properties
+++ b/project/build.properties
## -1,1 +1,1 ##
-sbt.version=0.13.5
+sbt.version=0.13.8
EDIT: How did I end up with this: the latest versions of Scala plugin for IntelliJ IDEA allow better editing of SBT configs (than previously), but (for now) one need to make the SBT project build a first time to import it (i.e. commenting suspicious lines). Once the project is imported, one can use autocompletion, auto-import and other joys. I hope it will be usefull with crossScalaVersions. About that, keep in mind that Play 2.4 is Java 8+ only and Scala 2.10 doesn't support fully Java 8. (First section of the "Play 2.4 Migration Guide")