Log4j vulnerability while using Scala and Spark with sbt - scala

I am working on a scala spark project.
I am using below dependencies:
libraryDependencies ++=
Seq(
"org.apache.spark" %% "spark-core" % "2.2.0" ,
"org.apache.spark" %% "spark-sql" % "2.2.0" ,
"org.apache.spark" %% "spark-hive" % "2.2.0"
),
with scalaVersion set to :
ThisBuild / scalaVersion := "2.11.8"
and i am getting below error:
[error] sbt.librarymanagement.ResolveException: unresolved dependency: org.apache.logging.log4j#log4j-api;2.11.1: Resolution failed several times for dependency: org.apache.logging.log4j#log4j-api;2.11.1 {compile=[compile(*), master(*)], runtime=[runtime(*)]}::
[error] typesafe-ivy-releases: unable to get resource for org.apache.logging.log4j#log4j-api;2.11.1: res=https://repo.typesafe.com/typesafe/ivy-releases/org.apache.logging.log4j/log4j-api/2.11.1/ivys/ivy.xml: java.io.IOException: Unexpected response code for CONNECT: 403
[error] sbt-plugin-releases: unable to get resource for org.apache.logging.log4j#log4j-api;2.11.1: res=https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/org.apache.logging.log4j/log4j-api/2.11.1/ivys/ivy.xml: java.io.IOException: Unexpected response code for CONNECT: 403
Security team has reached out to us to delete the vulnerable log4j-core jar. After which the projects which are using it as transitive dependencies are failing.
Is there a way on just upgrading the log4j version without upgrading scala or spark versions?
It should be a way where i can force the compiler to not fetch log4j-core jar of previous version which is vulnerable and in its place can use 2.17.2 version which is not vulnerable.
I have tried :
dependencyOverrides += "org.apache.logging.log4j" % "log4j-core" % "2.17.2"
also i have excludeAll option in sbt with spark dependencies but both solutions didnt worked out for me.

I just made few updates:
Added below settings to my sbt project:
Updated below settings to use a newer version: in build.properties and assembly.sbt respectively
sbt.version=1.6.2
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0")
Added the log4j dependencies on the top so that any transitive dependency now can use a newer version.
Given below is the sample snippet of one of my project:
name := "project name"
version := "0.1"
scalaVersion := "2.11.8"
assemblyJarName in assembly := s"${name.value}-${version.value}.jar"
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.**" -> "shaded.#1").inAll
)
lazy val root = (project in file(".")).settings(
test in assembly := {}
)
libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % "2.17.2"
libraryDependencies += "org.apache.logging.log4j" % "log4j-api" % "2.17.2"
libraryDependencies += "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.17.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0" % "provided"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.0" % "test"
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
libraryDependencies += "org.scalaj" %% "scalaj-http" % "2.4.0"
Below should be provided only in case of conflicts between dependencies if there are any:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case PathList("org", "slf4j", xs#_*) => MergeStrategy.first
case x => MergeStrategy.first
}

Related

SBT cannot resolve org.apache.hadoop

Sorry I am fairly new to Scala and SBT. Here is my build.sbt file
name := "test_stream"
version := "0.1"
scalaVersion := "2.12.10"
resolvers in ThisBuild += Resolver.bintrayRepo("streetcontxt", "maven")
mainClass in Compile := Some("basepackage.Main")
enablePlugins(JavaAppPackaging)
enablePlugins(DockerPlugin)
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-stream" % "2.6.1",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.693",
"com.streetcontxt" %% "kcl-akka-stream" % "2.0.3",
"me.maciejb.snappyflows" %% "snappy-flows" % "0.2.0",
"org.xerial.snappy" % "snappy-java" % "1.1.7.3",
"org.apache.hadoop" % "hadoop-common" % "2.10.0",
"org.apache.hadoop" % "hadoop-core" % "1.2.1"
)
And I get the following error:
sbt.librarymanagement.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-common;2.10.0: Resolution failed several times for dependency: org.apache.hadoop#hadoop-common;2.10.0 {compile=[default(compile)]}::
try removing ~/.sbt and ~/.ivy2 and run again

Transitive dependency errors in SBT multi-project

I am building a SBT multi-project project, which has common module and logic module, and logic.dependsOn(common).
In common, SparkSQL 2.2.1 ("org.apache.spark" %% "spark-sql" % "2.2.1") is introduced. In logic, SparkSQL also is used, however I get compilation errors, saying "object spark is not a member of package org.apache".
Now, if I add SparkSQL dependency to logic as "org.apache.spark" %% "spark-sql" % "2.2.1", it works. However if I add "org.apache.spark" %% "spark-sql" % "2.2.1" % Provided", I get the same error.
I don't get why this happened, why not the dependency can't be transitive from common to logic
here is the root sbt files:
lazy val commonSettings = Seq(
organization := "...",
version := "0.1.0",
scalaVersion := "2.11.12",
resolvers ++= Seq(
clojars,
maven_local,
novus,
twitter,
spark_packages,
artima
),
test in assembly := {},
assemblyMergeStrategy in assembly := {...}
)
lazy val root = (project in file(".")).aggregate(common, logic)
lazy val common = (project in file("common")).settings(commonSettings:_*)
lazy val logic = (project in file("logic")).dependsOn(common).settings(commonSettings:_*)
here is the logic module sbt file:
libraryDependencies ++= Seq(
spark_sql.exclude("io.netty", "netty"),
embedded_elasticsearch % "test",
scalatest % "test"
)
dependencyOverrides ++= Seq(
"com.fasterxml.jackson.core" % "jackson-core" % "2.6.5",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.6.5",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.6.5",
"com.fasterxml.jackson.core" % "jackson-annotation" % "2.6.5",
"org.json4s" %% "json4s-jackson" % "3.2.11"
)
assemblyJarName in assembly := "***.jar"

Scala/Spark: Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging

I am pretty new to scala and spark. Trying to fix my set-up of spark/scala development. I am confused by the versions and missing jars. I searched on stackoverflow, but still stuck in this issue. Maybe something missing or mis-configured.
Running commands:
me#Mycomputer:~/spark-2.1.0$ bin/spark-submit --class ETLApp /home/me/src/etl/target/scala-2.10/etl-assembly-0.1.0.jar
Output:
...
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
...
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
build.sbt:
name := "etl"
version := "0.1.0"
scalaVersion := "2.10.5"
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
mainClass := Some("ETLApp")
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided";
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.2" % "provided";
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided";
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2";
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M2";
libraryDependencies += "org.apache.curator" % "curator-recipes" % "2.6.0"
libraryDependencies += "org.apache.curator" % "curator-test" % "2.6.0"
libraryDependencies += "args4j" % "args4j" % "2.32"
java -version
java version "1.8.0_101"
scala -version
2.10.5
spark version
2.1.0
Any hints welcomed. Thanks
in that case, your jar must bring all dependend classes along when being submitted to spark.
in maven this would be possible with the assembly plugin and the jar-with-dependencies descriptor. with sbt a quick google found this: https://github.com/sbt/sbt-assembly
you can change your build.sbt as follows:
name := "etl"
version := "0.1.0"
scalaVersion := "2.10.5"
scalacOptions ++= Seq("-deprecation",
"-feature",
"-Xfuture",
"-encoding",
"UTF-8",
"-unchecked",
"-language:postfixOps")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.5.2" % Provided,
"org.apache.spark" %% "spark-sql" % "1.5.2" % Provided,
"org.apache.spark" %% "spark-streaming" % "1.5.2" % Provided,
"org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % Provided,
"com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M2",
"org.apache.curator" % "curator-recipes" % "2.6.0",
"org.apache.curator" % "curator-test" % "2.6.0",
"args4j" % "args4j" % "2.32")
mainClass in assembly := Some("your.package.name.ETLApp")
assemblyJarName in assembly := s"${name.value}-${version.value}.jar"
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "reference.conf" => MergeStrategy.concat
case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
case _ => MergeStrategy.first
}
add the sbt-assembly plugin to your plugins.sbt file under the project directory in your Project's Root directory. Running sbt assembly in the Terminal(Linux) or CMD(Windows) in the root directory of your project would download all the dependencies for you and create an U

Why does sbt assembly in Spark project fail with "Please add any Spark dependencies by supplying the sparkVersion and sparkComponents"?

I work on a sbt-managed Spark project with spark-cloudant dependency. The code is available on GitHub (on spark-cloudant-compile-issue branch).
I've added the following line to build.sbt:
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
And so build.sbt looks as follows:
name := "Movie Rating"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= {
val sparkVersion = "1.6.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
"org.apache.kafka" % "kafka-clients" % "0.9.0.0",
"org.apache.kafka" %% "kafka" % "0.9.0.0",
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
)
}
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", "spark", xs # _*) => MergeStrategy.first
case PathList("scala", xs # _*) => MergeStrategy.discard
case PathList("META-INF", "maven", "org.slf4j", xs # _* ) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
unmanagedBase <<= baseDirectory { base => base / "lib" }
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
When I execute sbt assembly I get the following error:
java.lang.RuntimeException: Please add any Spark dependencies by
supplying the sparkVersion and sparkComponents. Please remove:
org.apache.spark:spark-core:1.6.0:provided
Probably related: https://github.com/databricks/spark-csv/issues/150
Can you try adding spIgnoreProvided := true to your build.sbt?
(This might not be the answer and I could have just posted a comment but I don't have enough reputation)
NOTE I still can't reproduce the issue, but think it does not really matter.
java.lang.RuntimeException: Please add any Spark dependencies by supplying the sparkVersion and sparkComponents.
In your case, your build.sbt misses a sbt resolver to find spark-cloudant dependency. You should add the following line to build.sbt:
resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
PROTIP I strongly recommend using spark-shell first and only when you're comfortable with the package switch to sbt (esp. if you're new to sbt and perhaps other libraries/dependencies too). It's too much to digest in one bite. Follow https://spark-packages.org/package/cloudant-labs/spark-cloudant.

sbt runtime classPath does not match compile classPath, causes java.lang.NoClassDefFoundError

Runtime classpath according to 'show runtime:fullClasspath' contains only target/scala-2.11/classes and ~/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.11.7.jar.
compile:fullClasspath contains all libraryDependencies jar locations under ~/.ivy2/cache. Why is this? I am getting java.lang.NoClassDefFoundError on sbt run.
build.sbt:
name := "my-server"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= List(
"com.typesafe.slick" %% "slick" % "3.1.0" % "provided",
"com.twitter.finatra" %% "finatra-http" % "2.1.0" % "provided",
"com.roundeights" %% "hasher" % "1.2.0" % "provided",
"com.twitter" %% "util-logging" % "6.29.0" % "provided"
)
resolvers +=
"Twitter" at "http://maven.twttr.com"
resolvers ++= Seq("RoundEights" at "http://maven.spikemark.net/roundeights")
sbt run results:
Exception in thread "main" java.lang.NoClassDefFoundError: com/twitter/logging/Logger
sbt version 0.13.8
Removing "provided" was the fix here - I was using it incorrectly to resolve ambiguous subversions of dependencies (credit to pfn from freenode #scala)