SBT with Spark-Core 3.0.1 Exception NoClassDefFound for org/apache/log4j/Logger - scala

I am using Log4j with Scala 2.12.12 and Spark-Core 3.0.1 but when I change the library dependencies to not package spark-core in the fat jar I get the following error when I try to run it:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Logger
at com.some.package.name.Utils$.setup(Utils.scala:207)
at com.some.package.name.Main$.main(Main.scala:9)
at com.some.package.name.Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more
The compilation is successful and if I remove the provided clause from the dependencies line everything works fine. My build.sbt is as follows:
scalaVersion := "2.12.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.1" % "provided",
"org.apache.logging.log4j" % "log4j-api" % "2.13.3",
"org.apache.logging.log4j" % "log4j-core" % "2.13.3",
"org.scalatest" %% "scalatest" % "3.2.0" % "test",
"com.holdenkarau" %% "spark-testing-base" % "3.0.1_1.0.0" % Test)
If I remove my code which writes to the logger the setup for SparkContext is indicated as the line from which the error originates.

I am having a similar problem when trying to run my application on EMR 6.2.
I have found a comment on qubole github project claiming AWS' spark-core JAR is missing org/apache/spark/internal/Logging$class, but I have no way to verify if this is true.

Related

upgrade from Scala 2.11.8 to 2.12.10 build fails at sbt due to conflicting cross-version suffixes

I am trying to upgrade my Scala version from 2.11.8 to 2.12.10. I made following changes in my sbt file.
"org.apache.spark" %% "spark-core" % "2.4.7" % "provided",
"org.apache.spark" %% "spark-sql" % "2.4.7" % "provided",
"com.holdenkarau" %% "spark-testing-base" % "3.1.2_1.1.0" % "test"
when I am build the sbt file, I am getting following error
[error] Modules were resolved with conflicting cross-version suffixes in ProjectRef(uri("file:/Users/user/IdeaProjects/project/"), "root"):
[error] io.reactivex:rxscala _2.12, _2.11
Tried following possible ways. but no luck.
("io.reactivex" % "rxscala_2.12" % "0.27.0").force().exclude("io.reactivex","rxscala_2.11")
2.Removed Scala verion=2.11.8 from File->project structure->global libraries.
Any Help will be very useful.
It would be best if you could post your entire build.sbt file so it could be reproduced, but as a starting point, the dependency spark-testing-base is pointing to the wrong Spark version. From the documentation:
So you include com.holdenkarau.spark-testing-base [spark_version]_1.0.0 and extend
Based on the information you have provided you should be using:
"com.holdenkarau" %% "spark-testing-base" % "2.4.7_1.1.0" % "test"

How should I log from my custom Spark JAR

Scala/JVM noob here that wants to understand more about logging, specifically when using Apache Spark.
I have written a library in Scala that depends upon a bunch of Spark libraries, here are my dependencies:
import sbt._
object Dependencies {
object Version {
val spark = "2.2.0"
val scalaTest = "3.0.0"
}
val deps = Seq(
"org.apache.spark" %% "spark-core" % Version.spark,
"org.scalatest" %% "scalatest" % Version.scalaTest,
"org.apache.spark" %% "spark-hive" % Version.spark,
"org.apache.spark" %% "spark-sql" % Version.spark,
"com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.8.0" % "test",
"ch.qos.logback" % "logback-core" % "1.2.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.8.0",
"com.typesafe" % "config" % "1.3.2"
)
val exc = Seq(
ExclusionRule("org.slf4j", "slf4j-log4j12")
)
}
(admittedly I copied a lot of this from elsewhere).
I am able to package my code as a JAR using sbt package which I can then call from Spark by placing the JAR into ${SPARK_HOME}/jars. This is working great.
I now want to implement logging from my code so I do this:
import com.typesafe.scalalogging.Logger
/*
* stuff stuff stuff
*/
val logger : Logger = Logger("name")
logger.info("stuff")
however when I try and call my library (which I'm doing from Python, not that I think that's relevant here) I get an error:
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.company.package.class.function.
E : java.lang.NoClassDefFoundError: com/typesafe/scalalogging/Logger$
Clearly this is because com.typesafe.scala-logging library is not in my JAR. I know I could solve this by packaging using sbt assembly but I don't want to do that because it will include all the other dependencies and cause my JAR to be enormous.
Is there a way to selectively include libraries (com.typesafe.scala-logging in this case) in my JAR? Alternatively, should I be attempting to log using another method, perhaps using a logger that is included with Spark?
Thanks to pasha701 in the comments I attempted packaging my dependencies by using sbt assembly rather than sbt package.
import sbt._
object Dependencies {
object Version {
val spark = "2.2.0"
val scalaTest = "3.0.0"
}
val deps = Seq(
"org.apache.spark" %% "spark-core" % Version.spark % Provided,
"org.scalatest" %% "scalatest" % Version.scalaTest,
"org.apache.spark" %% "spark-hive" % Version.spark % Provided,
"org.apache.spark" %% "spark-sql" % Version.spark % Provided,
"com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.8.0" % "test",
"ch.qos.logback" % "logback-core" % "1.2.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.8.0",
"com.typesafe" % "config" % "1.3.2"
)
val exc = Seq(
ExclusionRule("org.slf4j", "slf4j-log4j12")
)
}
Unfortunately, even if specifying the spark dependencies as Provided my JAR went from 324K to 12M hence I opted to use println() instead. Here is my commit message:
log using println
I went with the println option because it keeps the size of the JAR small.
I trialled use of com.typesafe.scalalogging.Logger but my tests failed with error:
java.lang.NoClassDefFoundError: com/typesafe/scalalogging/Logger
because that isn't provided with Spark. I attempted to use sbt assembly
instead of sbt package but this caused the size of the JAR to go from
324K to 12M, even with spark dependencies set to Provided. A 12M JAR
isn't worth the trade-off just to use scalaLogging, hence using println
instead.
I note that pasha701 suggested using log4j instead as that is provided with Spark so I shall try that next. Any advice on using log4j from Scala when writing a Spark library would be much appreciated.
As you said 'sbt assembly' will include all the dependencies into your jar.
If you want use certain two option:
Download logback-core and logback-classic and add them on --jar spark2-submit command
Specify the above deps in --packages spark2-submit option

flink job on cluster - java.lang.NoClassDefFoundError: KeyedDeserializationSchema

I am trying out flink and running into an exception when deploying the job on a flink cluster (on kubernetes).
Setup
Flink - 1.4.2
Scala - 2.11.12
Java 8 SDK
Flink docker image on cluster - flink:1.4.2-scala_2.11-alpine
SBT file
ThisBuild / resolvers ++= Seq(
"Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/snapshots/",
Resolver.mavenLocal
)
name := "streamingTest1"
version := "0.1-SNAPSHOT"
organization := "com.example"
ThisBuild / scalaVersion := "2.11.12"
val flinkVersion = "1.4.2"
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-java" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion % "provided"
)
lazy val root = (project in file(".")).
settings(
libraryDependencies ++= flinkDependencies
)
libraryDependencies += "com.microsoft.azure" % "azure-eventhubs" % "1.0.0"
libraryDependencies += "com.google.code.gson" % "gson" % "2.3.1"
excludeDependencies ++= Seq(
ExclusionRule("org.ow2.asm", "*")
)
assembly / mainClass := Some("com.example.Job")
// make run command include the provided dependencies
Compile / run := Defaults.runTask(Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
).evaluated
// exclude Scala library from assembly
assembly / assemblyOption := (assembly / assemblyOption).value.copy(includeScala = false)
I can run the job locally on my machine in IDEA or via sbt cli and I can see the expected output or can send the output to my sink.
When running on the flink server which is deployed on a kubernetes cluster I see the following exception.
Error
java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not run the jar.
at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not run the jar.
... 9 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: The program caused an error:
at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:93)
at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)
at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:87)
at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69)
... 8 more
Caused by: java.lang.NoClassDefFoundError: org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema
at com.example.Job$.main(Job.scala:126)
at com.example.Job.main(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)
... 11 more
Caused by: java.lang.ClassNotFoundException: org.apache.flink.streaming.util.serialization.KeyedDeserializationSchema
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 20 more
More Background
My project has a scala class for the flink job and it also has java code in a separate directory for a connector which interfaces with EventHub. My main scala code does not have a dependency on KeyedDeserializationSchema but the connector has. KeyedDeserializationSchema most likely is coming from the kafka connector dependency which is included in my sbt file. Is there any reason the kafka connector will not be available when running this job on the cluster?
How can I debug which version of the kafka connector is being loaded on the server? Are there other ways of packaging which could force flink to load the kafka connector?
As I quickly realized after posting this question the fix was to remove "provided" for the kafka connector.
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-java" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion
)

java.lang.ClassNotFoundException: org.slf4j.LoggerFactory in Intellij, scala project with playframework

I have generated scala project imported to intellij by exposed dbt-model. In console test run fine but in intellij java.lang.ClassNotFoundException: org.slf4j.LoggerFactory, more boring:
java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at play.api.Logger$.<init>(Logger.scala:182)
at play.api.Logger$.<clinit>(Logger.scala)
at play.api.Application$class.$init$(Application.scala:272)
at play.api.test.FakeApplication.<init>(Fakes.scala:221)
at play.api.test.WithApplication$.$lessinit$greater$default$1(Specs.scala:20)
at UserTest$$anonfun$1$$anonfun$apply$1$$anon$1.<init>(UserTest.scala:10)
at UserTest$$anonfun$1$$anonfun$apply$1.apply(UserTest.scala:10)
at UserTest$$anonfun$1$$anonfun$apply$1.apply(UserTest.scala:10)Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
sj4j is attached to intellij by sbt. Where is the trick?
You may just included a dependency on SLF4J API, but you must also include an implementation that does the real logging work.
libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.10"
I recommend you to use logback-classic:
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.2"
The version may be incorrect.

How to use playframework 2.3 with specs2 2.4 instead of specs2 2.3.x

Recently, specs2 was updated to version 2.4, which uses scalaz 7.1 instead of 7.0.x now. Once I update my specs2 dependency in my play! 2.3 project to use version 2.4, all tests fail with the following exception:
[error] Uncaught exception when running ...Spec: java.lang.In
compatibleClassChangeError: Found class scalaz.syntax.FunctorOps, but interface
was expected
sbt.ForkMain$ForkError: Found class scalaz.syntax.FunctorOps, but interface was
expected
at org.specs2.specification.SpecificationStructure$.createSpecificationEither(BaseSpecification.scala:119)
at org.specs2.runner.SbtRunner.org$specs2$runner$SbtRunner$$specificationRun(SbtRunner.scala:73)
at org.specs2.runner.SbtRunner$$anonfun$newTask$1$$anon$5.execute(SbtRunner.scala:59)
at sbt.ForkMain$Run$2.call(ForkMain.java:294)
at sbt.ForkMain$Run$2.call(ForkMain.java:284)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Nobody seems to have had this error before. At least I was unable to find it in the issue tracking systems of the specs2 and play project.
I make it working in Play 2.3.8 with this settings.
"org.scalaz" %% "scalaz-core" % "7.1.1",
"com.typesafe.play" %% "play-test" % "2.3.8" % "test" excludeAll(
ExclusionRule(organization = "org.specs2")
),
"org.specs2" %% "specs2-core" % "3.5" % "test",
"org.specs2" %% "specs2-junit" % "3.5" % "test",
"org.specs2" %% "specs2-mock" % "3.5" % "test"
"com.typesafe.play" %% "play-test" % "2.3.3" depends on specs2 2.3.12, and specs2 2.3.12 depends on scalaz 7.0.6
https://github.com/playframework/playframework/blob/2.3.3/framework/project/Dependencies.scala#L9-L15
https://github.com/playframework/playframework/blob/2.3.3/framework/project/Dependencies.scala#L182
https://github.com/playframework/playframework/blob/2.3.3/framework/project/Build.scala#L276
You can/should not use these together. because scalaz 7.0.6 and 7.1.0 are binary incompatible.
if you want to use play2 and scalaz 7.1 together, I think there are some solutions
exclude "play-test" dependency libraryDependencies ~= { _.filterNot(m => m.organization == "com.typesafe.play" && m.name == "play-test") }
wait for play2.4 https://github.com/playframework/playframework/pull/3330
rebuild "play-test" module with scalaz 7.1 https://github.com/playframework/playframework/tree/2.3.3/framework/src/play-test/src/main/scala/play/api/test